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PREFACE 


The present work entitled ''Quantitative structure Activity 
Studies As A Tool For Molecular Designing of More Potent Drugs" deals 
with 2D and 3D QSAR studies of some steroidal and non steroidal estrogen 
receptor. The thesis has been divided into four chapters. 

The Chapter-1 has been divided into three sections. The 
Section-A gives an insight on the various aspects of drug designing and 
computer aided molecular modelling. The Section-B describes about 
QSAR (2D) methodology and parametrization. 3D-QSAR approaches have 
also been incorporated in this section. The Section-C includes the 
regression analysis method employed in this work for obtaining the QSAR 
along with a detailed description of the various descriptors used in the 
present thesis. 

The Chapter-2 comprised of an overview of the introductory idea 
about estrogen receptors and the various selective estrogen receptor 
modulators on which the present work has been carried out. 

The Chapter-3 has been divided into two sections. The Section-A 
is concerned with the results, discussion and conclusions part of SERMs 
placed on a common relative binding affinity (RBA) scale and the 
Section-B is concerned with the results, discussion and conclusions part of 
the SERMs whose binding affinity values are expressed in IC50 forms on 
which the present work has been carried out. 

The Chapter-4 comprises of two sections. The Section-A deals 
with introduction to 3D-QSAR using APEX-3D and CATALYST. The 
Section-B includes the results, discussion and conclusions of the estrogen 
receptors taken for 3D-QSAR study. This section also includes moleculai* 
modelling of some new potent estrogen receptors. In the last, references 



and certificate of training programme done on QSAR and molecular 
modelling at CDRI, Lucknow are also included. 

The investigations incorporated in the thesis were carried out in 
the Chemistry laboratory of the Department of Chemistry, University of 
Allahabad. The results of research work have not so for been submitted in 
part or in full for any degree or diploma of any Univeristy. 

A summary of the entire work incorporated is being submitted 
separately along with the thesis as required by the ordinance of the 
University of Allahabad for the award of D.Phil. Degree. 


Date: 
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CHAPTER-1 


inTRODUCTION 



SECTION-A 


DRUG DESIGN 

Drugs are of major importance to human health and nutrition on 
account of their action on biological and pathological process. The use of 
drugs has shown considerable worldwide increase in recent years and this 
tendency is likely to be increased in near future. In the earlier days purely 
randomized search procedures were involved in the discovery of new 
drugs. But the randomized search is no longer effective, as it is too time 
consuming, guarantees too title success and is too expensive. The chance of 
discovering a new agent has diminished to 1 in 10,000 and will decrease 
even further. Development costs has risen to more than 40 million dollars 
per new drug. This necessitated the development of a new logical and 
scientific approach in discovery of new dmg, which is known as "Drug 
Design"*. 


Drug design is an integrated developing discipline which 
portends an era of tailored drugs. Tailoring of drug" means alteration of 
various physical and chemical properties of drug molecule through 
insertion of newer functional moieties or by the replacement of such groups 
already present by others for example isoteric replacement. Tailoring also 
includes various configurational and stereochemical changes on drug 
molecule which afford flexibility and overall dimension of drug molecule, 
for example ring fusion, ring fission, formation of higher or lower 
homologue, introduction of optically active centre, formation of double 
bond towards geometrical isomerism and introduction of bulky group 
towards restricted rotation etc. It involves the study of effects of 
biologically active compounds on the basis of molecular interaction in 
terms of molecular structure or its physico-chemical properties involved. It 
studies the processes by which the drug produce their effect, how they react 
with the protoplasm to elicit the particular pharmacological effects or 
response, how they are modified or detoxified, metabolized or eliminated 
by the organism. 
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Thus, drag design involves either total innovation of lead or an 
optimization of already available lead. The lead is a prototype compound 
that has the desired biological or pharmacological activity but may have 
many undesirable characteristics. So the current trend in drug design is to 
develop new clinically effective agents through the structural modification 
of lead nucleus. 

The range of chemical compounds is virtually, and it will never 
be possible to explore it fully. This is particularly true in the search for new 
therapeutics because the multitude of biological systems and the tests they 
require have added a new dimension. Therefore, any procedure is 
unavoidably bound to consist of the selection of subsystems from a large 
group of compounds. A subsystem is in general understood to represent a 
number of chemical compounds formed by substantial variations in a given 
parent structure, referred to as the lead compound. The most important 
approaches currently employed to obtain sub-system or promising lead 
compounds are as follows^: 

1 . Mass screening 

2. Haphazard discovery 

3. Modification of natural compounds 

4. Discovery and exploration of side effects 

5. Investigation of drug metabolites 

6. Chemical modification of natural endogenous substances 

All these procedures are rather empirical, and much is being left 
to accident. The problem becomes even more serious where compounds for 
unknown action are to be designed. 

Another approach of developing a new drug involves the 
screening of a large number of new compounds of unusual structure for 
indications of pharmacological response and action against bacteria and 
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viruses. Currently, screening centres all over the world are seriously 
engaged in designing and testing newer compounds of miscellaneous 
structures for their therapeutic value in hypertension, cancer and other 
diseases. Tests are devised to gather maximal amount of information from 
the available samples. Thus, it is apparent that traditional approaches to 
treating various diseases have ranged from natural products to synthetic 
molecules. These traditional drugs act by interfering with the action of 
disease associated proteins or enzymes. These include antimetabolites 
inhibitors, substrates, analogs etc. However, the diseases caused by viruses 
appear at genetic level and necessarily require inhibition of the appearance 
of disease associated proteins. 

At this stage, appropriate theoretical methods might prove to be 
of utmost advantage in the development of drugs. Long ago, Crum-Brown 
and Fraiser'^ laid the foundation of such a theoretical approach proposing 
that the biological activity of a compound is a function of its chemical 
properties. With this concept, structure activity relationships (SAR)^*‘° are 
developed when a set of physico-chemical properties of a group of 
congeners is found to explain variations in biological response of those 
compounds. This has resulted in the discovery, examination and 
interpretation of structure activity relationships in a more systematic way, 
which has led to the introduction of quantitative structure activity 
relationships (QSAR). 

1 . 1 MOLECULAR MODELLiriQ 

The field of molecular modelling” is commonly thought of as 
being composed of several interlinked activities, including molecular 
graphics, computational chemistry, statistical modelling and to some 
dcgiee, molecular data and information management. The molecular 
graphics aspect represents the drug molecules and their associated 
molecular properties in a visual way, so that one may gain greater insight 
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into their pharmacologic behaviour. The computational chemistry 
compounds is concerned with simulation of atomic and molecular 
properties of compounds of medicinal interest through equations, and with 
the numeric methods used to solve these equations on the computer 
statistical modelling encompasses the search for quantitative relationships 
between the structures or properties of a series of compounds and their 
resultant biological activities. This aspect of medicinal chemistry enterprise 
is called Quantitative Structure Activity Relationship (QSAR). The 
chemical data/information management, part of the properties of thousands 
of compounds into an extensive database, capable of being searched for 
highly promising compounds with the right combination of properties to 
make them candidates of pharmacologic evaluation. Another component 
aids the chemist in the synthesis of new drugs by providing strategies and 
choices of ways to accomplish the organic synthesis of a series of drug 
candidates yet a third component may help to organize the attendant 
molecular properties of series of compounds to make them easier to subject 
to statistical analysis, such as QSAR. 

The common component of all these activities is the computer. 
Thus when all of the aforementioned activities are taken together, they 
constitute the field of medicinal chemistry known as Compute- Assisted 
Molecular Design (CAMD) or Computer- Assisted Drug Design (CADD). 

1.2 COMPUTER-ASSISTED DRUG DESIGN (CADD) 

CADD means: 

(a) no recipe for patents 

(b) helping in the decision which additional derivatives should 
be synthesized (optimization) 

(c) helping in detection of new leads and of exception. 
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(d) helping in our understanding of complex processes 
involved in drug action (mechanism). 

(e) helping in testing working hypothesis. 

(f) helping to analyze multivariate data from various test 
systems. 

(g) forcing to perform disciplined and quantitative data 
analysis. 

(h) forcing to do far sighted experimental design. 

(i) demanding and supporting interdisciplinary co-operation. 

(j) etc. 

The drug discovery and lead optimization process is currently 
dominated by developments in two fields'^. 

(1) a 'rational design' based on structural information and 
sophisticated computer methods to elucidate the structural 
prerequisite for binding to particular target. 

(2) a 'random screening' using high through-put-screening 
techniques to discover possible leads from large compound. 
Libraries provided increasingly by combinatorial chemistry. 

The above two approaches are complementary structural 
characteristics about a particular series of compounds which can be used to 
establish a SAR. The derived model helps to explain the important relative 
differences within a compounds series, suggests how to improve their 
binding properties and assists in ranking and selecting novel candidates for 
synthesis. 


[ 5 ] 



1 . 2.1 


DRUG DISCOVERY AMD DEVELOPMENT 
PROCESS 


The process of drug discovery''’ is a long, tedious and expensive 
one. The steps involved are: 

1 . New lead discovery 

2. Lead optimization 

3. Priclinical lead development 

4. Clinical lead development 

5. Post marketing surveillance 

[1] New Lead Discovery 

Their are number of ways of discovering new leads ; 

(i) Isolation of active substances from natural products for 
example Penicillin. 

(ii) Derivation and application of structure activity data for 
example Cephalosporin. 

(iii) Structure directed molecular design for example Carbonic 
anhydrase inhibitors. 

(iv) Chemists's intuition for example Enalapril. 

(v) Modification of natural products for example Theinamycin. 

(vi) Broad screening of known synthetic compounds for 
example Sulfa drugs. 

[2] Lead Optimization 

It can be done by following ways: 

(i) Synthesis and testing of congeneric structures. 
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(ii) Develop structure activity and/or mechanism of action 
based models. 

(iii) Calculate physical properties and correlate them with 
activity. 

[3] Preclinical Lead Development 

It consists of following studies: 

(i) Drug formulation experiments. 

(ii) Dose ranging studies in animals. 

(iii) Animal safety studies. 

(iv) Drug delivery/elimination/metabolism studies. 

(v) Develop large-scale synthesis. 

[4] Clinical Development 

It requires following steps: 

(i) Small-scale safety and dose ranging tests in healthy 
humans. 

(ii) Develop clinical study protocols obtained approval. 

(iii) Recruit clinical investigators and patients for study. 

(iv) Carry out the study. 

(v) Analyze and report results. 

[5] Post Marketing Surveillance 

It consists of collection of usage and side effects report. 
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1.2.2 CONTRIBUTIOMS AMD ACHIEVEMEriTS OF CADD 


Drug discovered upto now by classical methods were designed by 
trial and error method. Availability of CAMM systems has created new 
horizon for the design of new drug molecules. 

CADD can contribute not only to the design of potent compounds 
but also contributes to many steps in the development of a new drug from 
laboratory to clinic. It helps in discovering new lead structure as well as in 
lead optimization. 

CADD provides- 

(i) 3D-structure of molecules. 

(ii) Chemical and physical characteristics of the molecule. 

(iii) Comparison of the structure of one molecule with other 
different molecules. 

(iv) Visualization of complexes formed between different 
molecules. 

(v) Prediction about how related molecules might look a 
number of examples now exists that clearly show that 
CADD had made major contributions to the drug discovery 
processes'"^. 

(vi) Design of thymidylate synthetase inhibitors as anticancer 
agents (1191). 

(vii) Design of HIV protease inhibitors as antiviral agents 
(1994). 

(viii) Design of neutrophil elastase inhibitors as an antiglaucome 
agents (1989). 

(ix) Discovery of novel sweeteners using a sweet taste receptor 
model (1990). 
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1.2.3 REQUIREMEHTS FOR CADD 


Their are two main requirements for molecular modelling 

systems: 

(1) Graphics Hardware 

The predominant system for molecular modelling calculations are 
workstations with UNIX operating system. For example 3D-graphics 
workstation from Silicon Graphics and Evans and Sutherland Multi-picture 
systems. But the entire range of computer hardware is being used for 
CADD such as: 

(i) Desktop Mackintosh 

(ii) MS DOS personal computers 

(iii) Computer Servers 

(iv) Super computers such as oray super computers. 

(2) Software Packages 

A variety of conomercial packing are available, ranging from $ 50 
to $ 500 for PC based systems. Unfortunately, at preset there is no one 
system that meets ail the needs of the molecular modeller. Major currently 
available and commercial molecular modelling software systems are; 
Catalyst, Concord, Chem-X, Amber, Frodo, Sybyi/Alchemy, Cerius 2, 
Apex-3D, etc. 

1.2.4 APPROACHES USED IH CADD 

Two approaches are mainly used in CADD 

(A) Direct drug design 

(B) Indirect drug design 
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(A) Direct Drug Design 

In direct drag design‘^^ the 3D-features of the receptor site are directly 
considered for the design of new drag molecules (X-ray stractures or 3D-model of 
an enzyme). The 3D features of a receptor consist of electi'onic distribution, stereo 
features and hydrophobic potential. Based on lock and key fit of a drug molecule 
and the receptor i.e. considering complementarily of structure of the drug and 
receptor, drag molecule is designed from the known structure of the receptor. 

The approach is limited in use due to difficulty in deteraiining the exact 
structure of the receptors. Since the receptors are biomolecules, and mostly 
complex in nature, indirect drug design approach is more common in use. 

(B) Indirect Drug Design 

In indirect drug design‘d'’, the analysis is based on the comparison of 
the stereochemical and physicochemical features of ' a set of known 
active/inactive molecules and is interpreted in terms of complimentarity with 
the structure of the unknown receptor site. Structural analysis consideration and 
differences between the series of biological and synthetic molecules lead to the 
design of new substances along their path. The commonly used techniques for 
indirect drug designer. 

1 . QS AR analysis 

2. Molecular shape analysis 

3. Receptor surface model generation 

4. Phamacophore mapping 

5. Comparative Molecular Field Analysis (CoMFA) 

General steps involved in applying these techniques are: 

(i) Sketching of the molecules 

(ii) Energy minimization 

(iii) Conformational analysis and geometry optimization of the 
compounds. 
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1.3 QSAR ANALYSIS 


In medicinal chemistry, the early I960' s marked the surfacing of 
well defined method of quantitative studies of structure activity 
relationships with the publication of linear multiple 

regression model by Hansch and Fujita, the additive model by Free and 
Wilson, and the similar interaction model by Bocek and Kopecky. The 
QSAR methods have not been able to replace the intuitive approach 
although they have been of aid in reducing the number of educated guesses 
in molecular modification. Nevertheless, they have contributed directly to 
the practice of drug design and medicinal research. 

The fit of the structure and the complimentarity of the surface 
properties of a drug to its binding sites at the receptor/enzyme is essential 
for its biological activity. QSAR helps to understand structure activity 
relationships in a quantitative manner and to find the influence of certain 
properties on the biological activity and the strategy enables chemists to 
look at their structure also in terms of their physicochemical properties in 
addition. Now a days 3D molecular modelling enables a chemist two 
maintain a structure bases storing 3D structures, quantumchemical and 
physicochemical parameters of chemical compounds and activity data. It 
also helps him with an automated generation of knowledge bases storing 
quantitative and qualitative structure activity relationships represented by 
rules and models and also in prediction of biological activity of novel 
compounds. 

Most often QSAR analyses are retrospective studies, whether 
they follow a rational design of investigated structures or not. Only after 
performing synthesis and biological testing, a quantitative relationship is 
derived. Often the optimization of a lead compound is step by step 
accompanied by QSAR analysis. 
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The "Principle of QSAR" may be illustrated 
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1.3.1 PRinCIPLE OF QUANTITATIVE STRUCTURE 
ACTIVITY ANALYSIS 

The resulting biological activity parameters, "A" and molecular 
parameters, "Xi" are related, since biological activity is dependent on 
molecular structure and the resulting properties. Mathematical analysis 
reveals such connections in the form of so called quantitative structure- 
activity relationships (QSARs). 

QSARs can be constructed for different purposes and according 
to different methods structure response relationships describe the 
connections between the magnitude of a given biological effect and the 
drug structure in a set of congeners, they can therefore be employed to 
optimize the effect based on structural variations. 

The ultimate objective of QSARs is the prediction of either 
hypothesis on the mechanism of action or new analogues with any of the 
above methods, QSAR can help in recalling similar structures of biological 
activity profiles by computer analysis. It can also calculate the significance 
of the analogs at hand and there by suggest additional analogs to be 
synthesized. Finally, if a series of analogs shows no discernible trend of 
increased potency or specificity, QSAR may indicate that the best 
compound obtainable in this series has already been prepared and that work 
on these compounds should be terminated. Description of the molecular 
structure, electronic orbital distribution, reactivity, reaction rates and the 
role of structural and steric components and constituents of chemical 
compounds has been the subject of mathematical formulation by physical- 
organic chemists. Its conclusions were based on physical measurements. 
Among these are the determination of pKa and lipophilicity. The 
contribution of QSAR after 1964 was to quantitatize and evaluate 
relationships between such physical properties and biological activities and 
to improve the methodology of measurements. For this purpose, the 
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chemical structure of a compound has to be transformed into a set of 
numerical descriptors. 

QSAR have been used to forecast biological activities with 
varying degrees of reliability. 

A number of success stories are listed in an excellent review by 
Martin"'^. This review also catalogs the lack of universal success of 
predicting potency and mentions the following preventable circumstances 
that led to failure: 

1. The prediction was based on a poorly designed series on an 
ambiguous regression equation. 

2. It was based on extrapolations outside the range of the physical 
properties represented by the original substituents. 

3. The conditions of the biological tests were different. 

One can expect that such experimental errors will occur less 
frequently as more medicinal chemist become expert in QSAR and that the 
reliability of the method will improve. 

It must always be remembered that prediction derived from 
QSARs have a statistical character and thus always a certain probability of 
being in error. A basic limitation of structure activity analysis lies in the 
fact that only such information can be extracted as is present in the 
biological data. Thus structure activity analysis by itself can not lead to new 
concepts in therapy, although the manipulation of the information 
contained in biological data and chemical structures through the use of 
large computers and appropriate programs may also aid in developing new 
concepts. 


In summaiy, it may be said that methods of quantitative structure 
activity analysis in their current stage of development have already been 
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found to be extremely helpful instruments without which no effective 
search for new drug is possible and which will undoubtedly offer further 
possibilities in both drug research and elucidation of biological 
mechanisms. 

1.4 DRUG RECEPTOR IMTERACTIOnS 

The concept of interaction of drugs with certain substances with 
which they are capable of forming compounds according to their chemical 
affinity goes back to the work of Langley in The 

stereospecificity of such interactions were recognized by Fischer in 1 894’^. 
In the following study "receptor” is used as a synonym for any biological 
target for example any specific binding site of a macromolecule, strictly 
speaking this broad meaning is not correct from ours today's definition of 
receptors as being soluble membrane anchored or membrane embedded 
proteins that are able to produce certain biological response via a series of 
mostly unknown events^®''^^. 

During the past decades the originally static lock and key mode 
of ligand-receptor interaction was modified to a more realistic picture, with 
flexible drug molecules and dynamic receptors'^ Whenever a ligand 
approaches its binding site, both partners may change their shape (induced 
fit, flexible fit). Most of our knowledge regarding the geometry of ligand 
binding site interactions resulted from 3D structures of soluble proteins, 
especially of enzymes and their inhibitor complexes’^'’ 

An important contribution to the receptor concept resulted from 
recent investigations of Herbette of the partitioning into and the 
distribution of drugs in biological membranes. The correct spatial 
arrangement of the drug and its proper orientation in the membrane w.r.t. 
the binding site at the surface of the membrane embedded receptor are 
considered to be of atmost importance for the drug receptor interaction 
(Fig. I, II, III). 
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L 



Figure - 1 : A ligand may reach its binding site Si or S2 at the receptor R 
by direct diffusion in the aqueous medium as in the case of site S2 by 
partitioning into the membrane and then diffusing to the binding site. 


B 







Figure - II : The highly ordered structure of the lipid bilayer may restraint 
lipophilic drugs to a particular depth of penetration so it becomes 
important for the ligand to reach on optimum depth in order to show a 
maximum activity. 


C 



Figure - III : The orientation of the ligand relative to the binding site 
might also be optimized by the membranes by limiting the rotational 
degree of freedom of the drugs. So it becomes necessary for the chemist 
to design the molecule in order to make it orientation specific. 


Here, A ► Position specific 

B ► Target specific 

C ► Orientation specific 
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SECTION-B 


QSAR METHODOLOGY AND PARAMETRIZATION 

QSAR is nothing but the prediction of biological activity of a 
molecule prior to evaluation or even synthesis in order to reduce the costly 
and lime consuming synthetic work and biological screening'^^. 

Now medicinal chemist can achieve more potent drugs, occurring 
to the development of large matrix of QSAR methodology, with lesser 
dependence on trial and error synthesis and testing. 

2. 1 HISTORY AFiD DEVELOPMEHT OF QSAR 

In 1868, Cmm-Brown and Fraiser'^^'^*^ published an equation, 
which is considered to be the first general formulation of a QSAR. They 
observed that distinct changes in the degree of activity paralleling 
somewhat minor changes in chemical structures. Therefore they assumed 
that the "physiological activity" <}) must be a function of the chemical 
structure C 

(!> = SiQ -^(1) 

In 1893, Richet concluded that the degree of activity of organic 
compounds is inversely related to their water solubility. This postulate is 
known as Richet's rule. 

A<t> = f(AQ (2) 

where, A(|) = differences in biological activity values. 

AC = comesponding changes in the chemical and 

physico-chemical properties. 

At the turn of twentieth century, Meyer and Overton 
independently observed linear relationship between lipophilicity, expressed 
as oil water partition coefficients and narcotic activities. 
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According to Fuhner, with homologous series, narcotic activities 
increase in a geometric progression i.e. 1;3:3^:3^ etc., which gave the first 
evidence of an additivity of group contributions to biological activity 
values. 


In 1939, Ferguson observed a "Cut Off" biological activities 
beyond a certain range of lipophilicity and gave thermodynamic 
interpretation for such non linear structure activity relationship. 

In 1956, Bruice, Kharasch and Winzler formulated group 
contributions to biological activity values in a series of thyroid hormone 
analogs. Using equation (3), they obtained excellent correlations betw'een 
calculated and observed biological activities: 

log % thyroxine - like activity = KIj + C -> (3 ) 

w'here Sf = (fx + fx* + foR*)- Subscripts x* or OR’ represent 
substituent positions of the molecule and C is the constant. In 1964, an 
important contribution in the development of QSAR model was made by 
Free and Wilson ("de novo") model. They defined the biological response 
(BR) as equal to the sum of the contributions to the activity of the 
substituent groups plus the overall average activity (p) which might be 
attributed to the activity contribution of the parent structure as given in 
equation (3). 

BR = (substituent group contribution) + // -> (4) 

At the same time, Hansch and Fujita developed approach called 
Hansch analysis (or linear free energy related approach, LFER, or 
extrathermodynamic approach). 
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2.2 DEMANDS OFi BIOLOGICAL ACTIVITY DATA 
USED m QSAR ANALYSIS 

The various demands that should be mate for biological activity 
data used in QSAR Analysis^^ are as follows: 

( 1 ) Large range in observed activities 

(2) Identical mode of action (parallel dose response curve) 

(3) Concentration in molar units (does in g/kg is not suitable) 

(4) Activity data as a function of concentration (ED 50 , LD 50 , etc.) 

(5) Activity data in percentage (protein binding, metabolism, etc. 
have to be transformed). 

( 6 ) Attention, possible time dependency (steady state) 

Usually QSAR techniques are divided into two classes^^'*'’^. 

0 Classical QSAR 

0 3 D-QSAR 

2.3 QUANTITATIVE MODELS 

2.3.1 HANSCH MODEL 

(The Extrathermodynamic Approach) 

Hansch analysis correlates biological activity values with 

physicochemical properties by linear, linear multiple or nonlinear 
regression analysis, thus, Hansch analysis is indeed a property-property 
relationship model. As practically all parameters used in Hansch analysis 
are linear free energy-related values (i.e. derived from rate of equilibrium 
constants) the terms "linear free energy related approach" or 
"extra thermodynamic approach"^'^ are sometimes used as synonyms for 
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Hansch analysis. Also the biological activity values are if they are properly 
defined, linear free energy-related values (for example binding of inhibition 
constants, absorption and distribution rate constants or complex data which 
correspond to a weighted combination of several unit processes). 

Hansch proposed the action of drug as depending on two 
processes. Firstly the journey from the point of entry to the body to the site 
of action and secondly the interaction with the receptor site. He suggested 
the linear and non-linear dependence of biological activity on different 
parameters. 


Linear - 


log 


f-1 

UJ 


=alog P-¥b(j-¥cEs + d 


Non-linear 


— > ( 5 ) 


log 


f-1 

UJ 


= a log P + b (log P)~ ±(ccT± dEs ±e) 


( 6 ) 


Key observations are :- 

(1) Biological activity should be quantitatively related to a set of 
theoretical parameters, which are assumed to describe essential 
properties of the drug molecule. 

(2) The coefficients are determined by multiple linear regression 
analysis. 

(3) Activity of a drug is controlled by various additive factors. 

(4) Biological activity could be described by more than one 
parameter. So multiple linear regression could be used for drug 
design. 
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2.3.1.1ADVAriTAQES ATiD DISADVANTAGES OF 
HANSCH ANALYSIS 

Advantages: 

(1) Use of descriptors (pi, sigma, Es, etc.) from small organic 
molecules may be applied to biological systems. 

(2) Predictions are quantitative and may be evaluated statistically. 

(3) It is quick and easy. 

(4) Potential extrapolation conclusions reached may be extended to 
chemical substituents not included in the original analysis. 

Disadvantages : 

( 1 ) Descriptors are required for substituents being used. 

(2) Lai'ge number of compounds required (training set for which 
physicochemical parameters and biological activity is available). 

(3) Limitations associated with using small molecule descriptors on 
biological systems. 

(4) Steric factors of limited applicability in biological systems. 

(5) Predictions limited to structural class, (congeneric series) 

(6) Extrapolations beyond the values of descriptors used in the study are 
limited. 
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2.3.2 FREE WILSOH MODEL (THE ADDITIVITY MODEL) 


(de novo approach) 

This method is based on the assumption that the introduction of a 
particular substituent at a particular molecular position always leads to 
quantitatively similar effect on biological potency of the whole molecule. 
The Free Wilson approach is a true structure-activity relationship model. 
An indicator variable is generated for each structural feature that deviates 
from an arbitrary chosen compound values one, indicating the presence of a 
certain substituent or structural feature, and zero, indicating its absence are 
correlated with the biological activity values by linear multiple regression 
analysis. The resulting regression coefficients of the indicator variable are 
the biological activity contributions of the corresponding structural 
elements; "Mathematical model" "Additivity model" or "de novo 
approach" are synonyms for the Free Wilson method. 

It is expressed by the equation - 

Log BA = Contribution of unsubstituted parent compound + 
contribution of corresponding substituent. 

Log BA = p-k-Ziaij) — > (7) 

where, 'i' is the number of the position of which substitution 
occurs and j' is the number of the substituents at that position. 

The basic assumptions for the use of Free Wilson method are : 

( 1 ) All drug tested should have the same parent structure. 

(2) The substitution pattern in various derivatives has to be same. 

(3) The substituents have to contribute to the biological activity 
additively and in the same position with a constant amount being 
independent of the presence or absence of the other substituents 
in the molecule. 

A major limitation of this approach is that it is not reversible (like 
TT, O’ etc. It is applicable only to the system under study. 
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2.3.3 


THE MIXED APRROACH 


(The Relationship between Hansch and Free Wilson 
Analysis) 

Hansch analysis and the Free Wilson method differ in their 
application, but they are never the less closely related^^^’ Kubinyi has 
presented the contribution of Hansch and Free Wilson models as "mixed 
approach". The mixed approach can be written as : 


log 


1 _ 

C 


= I 


aij +1: Kjifg + K 


( 8 ) 


Where, 'Kj' represents the coefficient of different physiochemical 
parameters. 

In this equation, 'Saij' is the Free Wilson part for the substituents 
and (i)j = 7c, (7 and Es contribution of the parent skelton. The mixed 
approach was developed to find possible interactions between Free Wilson 
parameters and physico-chemical properties of the substitutes used. 

The basic assumptions of this method are - 

( 1 ) All the drug tested have some parent structure. 

(2) The substituent pattern in various derivatives has to be the 
same. 

(3) The substituents contribution to the biological activity 
additively being independent of the presence or absence of 
other substituents. 

Today mixed approach is the most powerful tool for the 
quaniitativc description of large and structurally diverse data sets. 
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2.3.4 OTHER QSAR APPROACHES 


2.3.4. 1 CRAMER'S SUB -STRUCTURAL ANALYSIS 

Berkoff, Crammer*^^'’° and Redl developed this method which is 
one of the first attempts to apply substructural analysis in the QSAR field. 
Tire compounds of a ti'aining series are fragmented into structures using a 
library of atom, bond and sub structural topological features. For each 
features sub structural activity frequency (SAF) is calculated as; 


number of active, compounds containing the feature] 
■' total number of compounds containing the feature j. 


Where SAFj is the probability contribution of the feature to the 
overall probability to characterize the activity of compounds the mean 
substructure activity frequency for i compound is given as 


MSAFi= — ZbijSAFi -^(10) 

nii ■ 

where mi is the number of features (fragments) occurring in the i^'^ 
compound and bjj is a substructural descriptor defined as: 

r f t .if the J''' feature is present inthni'^ compowuL . tilt 

^ij ~i O.ifiwt 

On the basis of experimental studies, it is observed that MSAF 
values are related with biological activity. 

2.3.4.2PRiriCIPAL COMPONENT ANALYSIS 

In Principal Component Analysis, data matrices obtained by 
measuring a given set of variables for a given set of objects are examined. 
Variables are presented as linear combinations of new variables called 
principal components. 
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This analysis is basically a mathematical method of describing 
and reproducing variables of data matrix by means of a new set of 
"abstract" variables, i.e., the principal components. 

This approach has extensively been discussed by Weiner and 
Malinowski who applied it to a variety of problems in chemistry.^^'’^®^ 

The model of principal component analysis may be written as: 

r 

Xjj = X % ->( 12 ) 

k=l 

or in matrix notation- 


X = UV + X + E ->(13) 

Where; 

Xi, = value of the j* variable for the i'*’ object. 

r = minimum number of components necessary to 

reproduce the Xj, within Ej,. 

Liji, = i^’’ element of the so called object component 

describing the property of the i^’’ object. 

Vk, = j^'’ element of the k^'' so called system component 

describing the k^'' property of the j“’ variable. 

X, = mean value of the j“’ variable 

E,, = residual error jexperimental plus model error). 

X = data matrix with the objects in the rows and the 

variables in the columns. 

U = object component matrix (elements : Uji;) 

V = system component matrix (elements : v^j) 

X = vector of mean values (elements ; X|) 

E = error matrix (elements ; Eij). 
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Once all object components have been identified with test vectors 
IhcN' can be replaced by the later. Equation (12) then transforms to: 

Xjj = + Vicj + Cj + Eij ( 14 ) 

equation (14) is the desired result of principal component 

analysis, representing a multivariate system of "regression" equations. 

This method requires data of sufficient precision and objectively 
existing relations among the various systems. 

The main steps of principal component analysis are summarized as: 

1 . Calculation of the con-elation matrix. 

2. Evaluation of object and system components by the principal 

component method. 

3. Determination of the (r) of relevant principal components 

necessary to reproduce the data matrix within experimental error. 

4. Uniqueness test. 

5. Selection of test vectors. 

6. Identification of the object components with test vectors. 

7. Formulation of the system of "regression" equations describing 

the variables in terms of the parameters used as test vectors. 

2.3.4.3RANK CORRELATIOn ANALYSIS. 

Sklcnar and Jager^^'®° developed this analysis. 

Before applying this procedure to it is assumed that 

within homologous series there exists a defined and monotonous relation 
between the level of biological action and molecular properties relevant to 
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that action. This is less strict requirement than the linear model assume to 
hold in the extra thermodynamic approach which is certainly realistic 
especially when dealing with quantum chemical molecular parameters. 
Such relations can be examined by means of rank correlation analysis. The 
fii-st step comprises the transformation of the values of biological activity 
and the molecular parameters concerned into rank numbers. These numbers 
indicate the position of each value when the respective data are ranked in 
decreasing order. 

Width the aid of the rank numbers and equation- 

I'si -6Xdf /(n^ -n) -»(15) 

i=I 

where, d, denotes the difference of the rank numbers of parameter 
X| and biological activity of the i^'’ compound and n is the number of 
compounds in the sample. This equation (15) serves to characterise the 
connection between molecular property (given) and biological activity. If 
the correlation between A and Xj is perfect = |rsi| = 1. 

2.3.4.4LiriEAR DISCRIMiriArJT AFIALYSIS 

The discriminant functions in this analysis serve as classification 
algorithms, representing weighted linear combinations of features re]e\'anl 
for class separation. There are several types of discriminant functions, 
among which the non-elementary ones have optimal properties. 

If Q classes occur (q = 1 Q), Q - 1 non elementary- 

discriminant function w^ (k = 1 Q - 1) of the general form 

m 

. ->(16) 
j=i 
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where, a^j denotes the weight factor (coefficient) of the 

molecular parameter (j = 1 m) in the discriminant function and 

the w\'s being designated as non-elementary discriminant variables. 

The main steps of this analysis are; 

1 . Selection of molecular parameters and formulation of approaches. 

2. Check whether classes can be separated using these parameters 
and elimination of parameters not relevant for class separation. 

3. Computation of non elementary discriminant functions. 

4. Inteipretation of discriminant functions and reclassification of the 
compounds from the training series as a means of testing the 
quality of separation. 

5. Classification of compounds not yet investigated so that further 
synthesis and tests can be planned while allowing for mechani.stic 
conclusions from the form of the discriminant functions. 

Using several modifications discriminant analysis has been 

81 S8 

widely and successfully applied in the QSAR field by several workers . 

2.3.4.5PATTERri RECOOrilTIOn TECHNIQUES 

Pattern recognition technique®^'^° is similar to the classical QSAR 
method, only the number of variables is much larger in this than in Hansch 
analysis. Problems are associated with proper selection of a training set and 
stepwise regression. Therefore multivariate methods like PCA or soft 
modelling techniques for example SIMCA or PLS analysis are used instead 
of regression analysis. 

2.3.4.6ARTIFICIAL INTELLIGENCE PROGRAMS 

CASE (Computer Automated Structure Evaluation)^'"' 
automatically identifies molecular features (biophores) that contribute to 
biological activity. A machine learning program GOLEM from the field of 
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inductive logic programming uses activity, structural and stereochemistry 
information of active and inactive analysis to derive inductive hypothesis. 

2.3.4.7MOLECULAR ORBITAL METHOD 

Molecular orbital method®^ method is one of the most important 
among several semi- empirical approaches used in drug design. As per the 
basic assumption of this theory the electrons are considered as being 
associated with molecule as a whole rather than with a particular 
substructure. The molecular orbital wave functions (cp) constructed from 
atomic orbital provides information about the physical properties of a 
molecule. This is turn helps to get information about ionization potential, 
electron affinity etc. 

2.4 3-D QSAR 

Three-Dimensional Quantitative Structure Activity Relationship 
(3D-QSAR)^‘* analysis is a sufficiently new area of Computer Assisted 
Molecular Design (CAMD). 3D-QSAR study is an ability to characterize 
the shape and charge distributions of a molecule in 3D shape. This is a 
consequence of the predominance of steric and electrostatic interactions in 
the binding of a drug to its targeted receptor site. 

3D-QSAR are developed for specific, highly anisotropic ligand- 
receptor interactions that correspond to in vitro biological assays. In the 
very limited applications of 3D-QSAR to in vivo sets, the problem has 
usually been thought of in terms of a specific ligand receptor interaction 
component (specific 3D descriptors) and a non-specific transport 
metabolism or other, component (general-thermodynamic descriptors such 
as log P). Overall, 3D-QSARs probe and extract information about a 
specific interaction involving the ligand, which almost always, also 
involves the ligand's receptor site. Thus, one useful way to compare and 
contrast 3D-QSAR methods is to identify what aspects of the general 
ligand-receptor-binding process are being considered in a particular 3D- 
QSAR formalism. 
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2.5 


PHARMACOPHORE MAPPING 


A phamacophore or phamacophorc pattern is ‘lie set of features 
required for a compound to elicit certain biological activity. These features 
are typically any combination of structural, chemical and physical attributes 
of a molecular structure. The process of developing a pharmacophoric 
patlei n i.e. pharmacophore mapping^^ involves three main aspects: 

1 . Finding the features required for biological activity. 

2. Determining the molecular conformation required (i.e. the 
bioactive conformation) 

3. Developing a superposition as alignment rule for a series of 
compounds. 

2.6 3D-QSAR APPROACHES 

It deals with the various techniques of pharmacophore mapping 
which serve the important approaches and are as following: 

2.6.1 COMPARATIVE MOLECULAR FIELD ANALYSIS (CoMFA) 

To some 3D-QSAR analysis and CoMFA’'^*’^’ are one and the 
same. CoMFA is by far the most often employed Rl 3D-QSAR approach, 
rcilccting a novel, conceptually satisfying scientific approach reduced to 
practice as a well-written and versatile software package. There are many 
reports in the literature of successful application of CoMFA that have not 
only led to predictive modeC within an analogue series of biologically 
active molecules^’ but also to insightful information on the general 
requirements for the expression of the activity. 

It is a 3D-QSAR approach which places molecules in a grid and 
correlates a field properties (electronic and steric field energy) with 
biological activities. 
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The method was developed by Cramer et.al. in 1988. It invoh es 
the Ibl lowing steps. 

1 . Selection of reference compound and structural alignment. 

2. Electronic charge calculation on each compound of the series. 

3. Calculation of electrostatic and static field energies at various 
grid points in a lattice of specified dimensions using probe atoms 
or groups. 

4. Regression analysis of specified dimensions using probe atoms or 
groups. 

5. Testing of CoMFA models. 

Because of the increasing availability of powerful computational 
hardware and software and of the advances in our basic understanding of 
theoretical chemistry, the chemists now have possibility of calculating 
many properties of molecule which have not yet been synthesized and of 
being reasonably assumed that the synthetic molecule infact exhibit those 
properties. 

2.6.2 MOLECUALR SHAPE ANALYSIS (MSA) 

A formalism that deals with the quantitative characterization, 
representation and manipulation of molecular shape in the construction of a 
QSAR is molecular shape analysis (MSA)^^ This method was developed 
by Hopfinger. He incorporated conformational analysis to Hansch 
analysis^^. The Common Overlap Steric Volume (COSV) between a pair 
of superimposed molecules can be used as a global measure of molecular 
shape similarity in constructing QSARs^®. Subsequently, the spatially 
integrated potential energy field was shown to be a complementary 
extension of COSV as a general QSAR shape descriptor. 


[ 31 ] 



2.6.3 RECEPTOR SURFACE MODEL GEnERATIOfi (RSM) 

Receptor surface is generated complementary"^'^ to the acti\-c 
molecules. From this, it can be seen that at which sites interaction with drug 
takes place in terms of hydrophobic, electrostatic, hydrogen bond acceptor 
or donor or charge interaction. RSM can be used to predict the acti\’it\ ol" 
new designed compounds. Steps followed are; 

1 . Selection of reference compounds and structure alignment. 

2. RSM generation using van-der-Waal's field function. 

3. Mapping of electrostatic potential charge, hydrogen bonding and 
hydrophobicity properties of the RSM. 

4. Evaluation of RSM and calculation of different types of 
interaction energies between structure and RSM. 

5. Generation of QSAR equation. 

2.6.4 HYPOTHETICAL ACTIVE SITE LATTICE MODEL (HASL) 

The hypothetical active site lattice approach (HASL) is related 
to the CoMFA methodology and also to MSA. The HASL approach 
represents each of the shapes of the molecules as a collection of 3D grid 
points, which is termed the molecular lattice. The resolution of the H.ASL 
(i.e. the distance between the grid points) determines the number of lattice 
points that represent a molecule and also the resolution of the generated 
receptor map. User defined conformations are selected to generate HASL. 
Typically, conformations similar in shape are chosen. 

The two aims of HASL approach are the prediction of acti\'ities 
of interested compounds, as well as the identification of substructures that 
most influence the observed activities. 
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2.6.5 CATALYST 


(\ila!ysi is a popular, turn key commercial software package that 
osiahlishes ol)-QSARs based on a training set of compounds and their 
adix’ilics against a common end point‘°l The turn key aspect of the 
package refers to the user having to only sketch in the structures in the 
training set, input the corresponding activity measures and provide some 
control data, such as the number of confirmation to be sampled in the 
conl'ormational search for the active conformation. Here graphic 
representation of the most active compound in its active confirmation with 
the 3D-pharmacophore is found in the analysis superimposed on the 
compound. 

2.6.6 APEX-3D PROGRAM 

Apex-3D program’®^ can evaluate 2D (Topological) or 3D 
(Topographical) relation between the pharmacophoric points. Generation of 
biophores (pharmacophores) involve determining low energy, 
representative conformers for each compound, calculation of descriptors for 
potential biophoric atoms and searching (using a clique detection 
algorithm) for maximal common 2D or 3D arrangements of biophoric 
centres. The arrangement or patterns are potential biophores and are then 
evaluated for their statistical quantitative correlation with biological 
activity. 

2.7 APPLICATION OF QSAR 

QSAR has three essential applications : 

( 1 ) As an instrument for prediction. 

(a) Estimation of physico-chemical properties using 
substituent constants. 

(b) Reduction of the number of compounds to be synthesized. 
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(c) Faster detection of the most active compound. 

(d) Avoidance of synthesis of compounds with some activity. 

(2) As a diagnostic instrument 

(a) Information on possible types of interaction forces 

(b) Information on the "nature” of receptor. 

(c) Information on the mechanism of action. 

(3) Detection of exceptions (outliers). 

2.8 LIMITATIONS OF QSAR 

Though QSAR studies can be successfully utilized to predict the 
activity of new analogues and in the determination of mechanism of drug 
receptor interactions, they have some drawbacks and limitations 

The most serious problem in QSAR is the lack of fundamental 
understanding of how to quantitatively describe substituent effects on non- 
covalent intermolecular (for example drug-receptor) interactions. 

Mutual conformational adaptation of drug and receptor may also 
occur after interactions. Since no specific parameter has yet been developed 
for the description of variation in confirmation and conformational 
flexibility, it imposes limitation on the success of QSAR analysis. 

Electronic effects of the substituent may change both the degree 
of utilization and the charge distribution. The former may affect the amount 
of active species available to the receptor while the latter may affect the 
strength of the drug receptor interaction. 

QSAR study may be incorrectly interpreted if the biological 
property of interest is most correctly measured. 
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2.9 QSAR DESCRIPTORS (QSAR PARAMETERS) 


Descriptors are needed to describe the intermolecular forces of 
the drug receptor interaction, the transport and distribution of drug in 
quantitative manner and to correlate them with the biological activities. The 
QSAR descriptors have been broadly classified into : 

( 1 ) Conventional and 

(2) Non-conventional 

Conventional descriptors are used in classical approach towards 
QSAR. This includes thermodynamic, electronic and steric parameters. 
Non-conventional or advanced parameters explain the 3D electronic and 
steric characteristic of the molecules. Examples include molecular volume, 
molecular surface area, density, dipole moments etc. 

Some of the important parameters are described below : 

2.9.1 LIPOPHILIC/ THERMODYTiAMIC/ HYDROPHOBIC 
PARAMETERS 

No other physiochemical property has attracted as much interest 
in QSAR studies as lipophilicity (synonymously called by 
hydrophobicity)‘°‘^’ due to its direct relationship to solubility in aqueous 
phases to membrane permeability, and to its (merely entropic) contribution 
to ligand binding at the receptor site. Some important lipophilic parameters 
are log P'°® (partition coefficient), Rm (chromatographic parameter) S 
(Entropy) and n (Hydrophobic constant). 

2.9.2 STERIC/SPATIAL PARAMETERS 

Steric substitution constant'*®’^'"'’'®^’' is a measui e of bulkiness of 
group it represent and its effect on the closeness of contact between the 
drug and the receptor site. As the bulky substituent delays the detachment 
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of drug from the receptor, it leads to late onset and long duration of action. 
Stcric effects are difficult to describe due to the fact that the 3D structures 
of the binding sites of drugs are most often unknown. Some important 
steric parameters are Es (Taft steric Parameter) % (Molecular 
Connectivities), MSD (Minimal Steric Difference), L, B 1 -B 4 (Sterimol 
Parameters), PMl (Principal Moment of Inertia), MR (Molar Refractivity), 
PSA (Polar Surface Area), Vw (van der Waal's volume), etc. 

2.9.3 ELECTRONIC PARAMETERS'^' 93, 109, no 


Electronic pai'ameters describe the influence of certain group or 
substituent on electron density distribution and thus its effect on biological 
activity. They affect the metabolism and elimination pattern of the drug and 
drug receptor interaction. Some of the widely used electronic parameters 
are ; Hammet a constants'". Field and Resonance parameters, F and R, 
parameters derived from molecular spectroscopy, pKa values"^, charge 
transfer constants, dipole moments, hydrogen bonding parameters and 
parameters derived from quantum-chemical calculations ', for example 
orbital energies and partial charges. 

2.9.4 POLARIZABILITY PARAMETERS 


Molar volume (MV), molar refractivity (MR), molar polarisation 
(MP) and parachor (PA) are theoretically and practically closely 
interrelated parameters* *'^’ ' 


MV 

MR 


PM 


MW/d 


->(17) 


MV('n- -1 ) 


MW(t]- -1 ) 


(cm^ / mol (18) 


MV 


I- -1 
I- +2 


-^(19) 
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PA 


( 20 ) 


MVyy^ 

where, 

MW = Molecular weight 

d = density 

i) = refractive index at 20°C 

Z = dielectric constant 

y = surface tension 

Molar volume itself is not strictly additive parameter but the 
corrected volume parameters, MR and PA are additive constitutive 
molecular properties. For liquids, the MR value can be calculated in units 
of volume using Lorentz-Lorentz equation ( ). 

MR has been correlated with lipophilicity, molar volume and 
steric bulk. Due to its MV component, it is related to volume and size of a 
substituent and thus contributes steric properties. The refractive index 
related correction term (q) in MR accounts for the polarizability and thus 
for the size and the polarity of a certain group. Larger the polar part of a 
molecule, the larger its value will be. 

The parachor is a molar volume, MV, which has been corrected 
for forces of intermolecular attraction by multiplying with the fourth root of 
surface tension, y. The parachor has an advantage as steric parameter in that 
it is easy to calculate either from atomic contributions or from the 
component chemical bonds. 

2.9.5 TOPOLOGICAL PARAMETERS 

Numbers reflecting certain structural features of organic 
molecules that are obtained from the respective molecular graphs are 
usually called "topological. indices". Such a number is usually obtained by 
imposing certain conditions on vertices (atoms), edges (bonds) or both. 
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A plethora of topological indices has been considered in the 
chemical literature and some of them have been found to possess quite 
remarkable applications in chemistry and in drug research. A topological 
index expresses topological information for a given chemical structure. 

The advantage of topological indices is that they may be used 
directly as simple numerical descriptors in quantitative structure property 
activity relationships (QSPR, QSAR). These relationships are mathematical 
models that enable the prediction of properties and/or activities from 
structural parameters. 

Most of the topological indices are derived either from distance 
matrix, or adjacency matrix or some combination of both distance as well 
as adjacency matrix. 

Some of the popular topological indices are: 

Wiener index (W), Hararay index (Har), Balaban index (B), information 
theoretic index, (ISIZ), Quadratic index (Qindex), Ramification index 
(RAM), Kier and Hall valency connectivity indices Centralization 

(Cent), Variation (Var), etc. 

2.10 PARAMETERS USED m THE PRESEHT STUDY 
(THESIS) 

1 . 

It is a free energy related parameter (LFER), which expresses the 
relative free energy change occurring on moving a substituent from one 
phase to another. This is an additive property. It means, with the help of n 
values of the substituents, the log P value of any molecule may be 
calculated by simple addition. 

m 

log P = (additive free energy) ^ (21 ) 


[ 38 ] 



Similarly, the hydrophobic substituent, ti, of a given substituent X 
is the dilTcrcnce of log P values of the substituted compound R-X and the 
unsubstituted compound R-H. 


logP 


K = log 


P( P-X ) ~ ^(^g P( R-H ) 


J 


->( 22 ) 

-^( 23 ) 


This parameter describes the permitting behaviour of the 
molecule in aqueous and lipid phases. 


In the present thesis log P is calculated by the method proposed 
by Moriguchi et. al*'^. 

2. Molar Refractivity (MR)^°® 

This parameter was proposed by Pauling and Pressman. It is a 
parameter for correlation of dispersion forces in the binding of haptens to 
antibodies. 


It is formulated as: 

T+1) d 


— >• ( 24 ) 


where. 


^ = refractive index of the compound 

d = density of a compound 

MW = Molecular weight of the compound 

MR is an additive constitutive molecular property, like log P. MR 
has been correlated with lipophilicity, molar volume and steric bulk. The 
refractive index-related correction term in MR accounts for polarizability 
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and thus for the size and polarity of certain groups . The larger the polar 
part of a molecule is, the larger its MR value will be. A positive sign of MR 
in a QSAR equation can be explained by binding of the substituents to a 
polar surface, while a negative sign or a non linear relationship indicates a 
limited area or stcric hindrance at this binding site. In the present thesis .MR 
is calculated by the method proposed by Ghose and Crippen' 

3. Equalized Electronegativity (Xeq) 

A significant development in the electronegativity concept has 
been provided by Sanderson's formulation of the principle of 
electronegativity equalization, which states that "when two or more 
elements initially different in electronegativity combine chemically, they 
become adjusted to the same intermediate electronegativity within the 
compound". This principle which has gained wide acceptance in recent 
year abandons the idea of fixed electronegativity and redefines the values in 
electronegativity table as quantities characteristic of isolated atom before a 

I o I j "to 

bond is formed 

The physical and chemical properties of substances are largely 
determined by partial charge on the constituent atom and the evaluation 
of these partial charge is an important electronegativity application. In the 
framework of Sanderson's principle, it is generally believed that partial 
charge acquired by an atom through chemical combination is proportional 
to the difference between the final equalized electronegativity and the 
initial, pre-bonded electronegativity. 

Charge conservation equation leads to a general expression for Xcq. 
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where, 


N=2(v) = 


V 


X 


the total number of atoms present in the species 
formula. 

= Number of atoms of a particular elements in the 
species formula. 

= Electronegativity of that particular atom. 


The group electronegativity is calculated by : 




G — 



X 


(26) 


where, Nq is the of atoms in the group formula. 

4. Van der Waal's Volume (Vw) 

The Van der Waal's volume (Vw) has been found to be one of the 
most fundamental characteristics of the drug structure, controlling the 
biological activity. This determines the molecular size and shape of the 
compounds, which is very important aspect of drug receptor interactions. 
Not only this, the hydrophobic behaviour of drug molecule has been shown, 
to be significantly correlated with Vw (Moriguchi et. al)‘'^. Consequently 
the V\v was found to be related with various biological activities of drugs. 


To find Van der Waals volume (Vw) of molecules spherical 
shapes were assumed for all atom according to Bondi “ , because of the 
absence of generally accepted pear shapes. 

Since Van der Waals radii are greater than covalent radii, a 
correction for sphere overlapping due to covalent bonding between atoms 
was needed for the calculation of Vw for polyatomic molecules. 


Vw = T (Sphere volume of atoms) + S (Correction value 
between atoms) -> (27) 
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SECTION-C 


STATISTICAL METHODS USED IN QSAR ANALYSIS 


The primary objective of QSAR is to predict the biological 
activity for new compounds in the test set. The multivariate statistical 
approach'"'’''"^ is the best possible tool to be used, since it is the best way of 
utilizing all the information required at the same time. The various 
multivariate methods have been developed in describing the structure of the 
available data sets and therefore to predict the behaviour of the new 
samples. 

1.3.1 REQRESSIOri AFiALYSIS 

Regression is a measure of average relationship between two or 
more variables in terms of original units of the data. Regression analysis 
correlates independent X-variables (for example physico-chemical 
parameters, indicator variables) with dependent Y-variables (for example 
biological data). The dependent variables contain error terms, while the 
independent variable are supposed to contain no such error. Regression 
analysis is used for estimating or predicting the unknown value of one 
variable from the known value of other variable. There can be different 
types of regression analysis. 

a. Simple linear regression 

A single independent variable is used for each calculation and a 
set of QSAR equation is generated. Each equation contains one variable 
from the descriptor set. 

b. Multiple linear regressions 

It calculates QSAR equations by performing standard 
multivariable regression calculations using multiple variables in a single 
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equation. The variable should be independent and minimize the possibility 
of chance correlation. The number of independent variables cannot be more 
than one fifth of the number of compounds in the training sets. 

c. Stepwise multiple linear regression 

This is useful when numbers of independent variables are ver\' 
high. It calculates QSAR equations by adding one variable at a time and 
testing each addition for significance. Only variables found to be significant 
are used in the first QSAR equation. With this method, a correlation matrix 
is compared. 

1.3.2 DISCRIMiriAHT ANALYSIS 

Discriminant analysis'^®’’^^ method was first introduced by 
Yvonne Martin in 1974. It is an extension of regression analysis. It is a 
statistical technique which allows exploration of the significance of 
correlation between a crude activity parameter (the group) and either 
continuous or discontinuous indicator variables taking the value 1 to 0 
according to presence or absence of certain molecular features. Thus it 
separate objects with different properties for example active and inactive 
compounds by deriving a linear combination of some other features. 

a. COMPACT^^^ (Computer Optimized Molecular Parametric 
Analysis of Chemical Toxicity) is a discriminant analysis 
approach used to predict toxicities. 

b. (Adaptive Least Square Analysis) is a modification of 
discriminant analysis, which separates several activity classes by 
a single discriminant function. 

c. ORMUCS^^^ (Ordered Multicategorial Classification using 
Simplex Technique) is an ALS related approach which applies 
simplex technique for derivation of discriminant function. 
Recently a fuzzy version was developed and used in QSAR 
studies. 
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d. SIMCA'''”^ (Similarity, Chemistry and Analogy) is a class 

modeling technique which places objects from P-dimensional 
space into lower dimension boxes. Discrimination of objects of 
different classes is possible by deriving separate principle 
component models for each class. FA, FCA and NIPALS are 
some other methods used. 

1.3.4 PARTIAL LEAST SQUARE ANALYSIS (PLS) 

Partial least square analysis (PLS)*^^ is the most useful 
multivariate statistical method. Many, hundreds or even thousands of 
independent variables can be correlated with one or several dependent 
variables. Perfect correlation is obtained due to the usually large number of 
independent variables. Due to complexity of PLS algorithm and availability 
of computer programmes for regression analysis, it is not much used. But 
for 3D-QSAR methods like CoMFA, PLS analysis is the method of choice. 

1.3.5. CLUSTER SIGNIFICANCE ANALYSIS (CSA) 

To evaluate a congeneric series of compounds, if a graph is 
plotted by taking biological data (for example active/inactive) on Y-axis 
and physico-chemical parameters on X-axis, sometimes the active 
compounds tend to cluster in a relatively confined region of the graph. Such 
clustering suggest that there is a connection between the parameters and 
biological activity. The advantage of the method’^*^ is that qualitative or 
rank-ordered biological data can be used. 

1.3.6 VALIDATION OF QSAR 

The QSAR gives us information of how changes in the structure 
of the actual compounds influence their biological activity’^^. This in turn, 
allows us to (a) modify the structure in improve drug potency, decrease 
toxicity etc and (b) improve or understanding of the actual biological 
mechanism. 
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log I/C = bo + biJV+ b2<J+ bsMR + b^T^ s 


~^(2S) 


First necessary condition for model validity is that R~ is close to 
1 .0 (R' > 0.90, r > 0.95) and s is small, say, smaller than 0.3, if Y = log I/C. 
However, a large R and small s is not sufficient for model validity 
because regression models will give closer fit if there are larger the number 
of parameters and terms in the model. 

Recent developments in statistics provide us with a new 
interesting set of measures of validity that are based on simulating the 
predictive power of a model. These tools-bootstrapping and cross- 
validation operate by creating a number of slight modifications of the 
original data set, estimating parameters from each of these modified data 
sets and then calculating the variability of the predictions by each of the 
resulting models. 

Cross-validation which is the simplest to apply, creates a number 
of modified a way that each observation is taken away once and once only. 
Then one model is developed for each reduced data set and the response 
value (y) of the deleted observations are predicted from the model. The 
squared difference between predicted and actual values are added to the 
Predictive Residual Sum of Squares (PRESS). 

PRESS is a good estimate of the real prediction error of the 
model, provided that the obseiwations were independent. If PRESS is 
smaller than the sum of squares of the response values (SSY), the models 
predict better than chance and can be considered "statistically 
significant". The ratio of PRESS/SSY can be used also to calculate 
approximate intervals of predictions of new observations (compounds). 

To be reasonable QSAR model, PRESS/SSY should be smaller 
than 0.4, and a value of this ratio than 0.1 indicates an excellent model. 

Press = Ijyi - yif / (1 - hiif] (29) 

— 



where yi and yi are the response (activity) values of observation i 

(i = 1,2, n), observed and predicted by the model respectively. The 

diagonal elements of the "hat" matrix, H, [H = X (X X)'^X ] are denoted by 
hii. X = ( n X p) is the data matrix containing one column for each of the p 
terms of the model. 

Cross-validation does not work well when- 

(i) The observations are strongly grouped and hence not 
independent. With QSAR, this often happens when two or more 
different types of compounds are put in the same model. 

(ii) The second situation occurs when cross-validation is applied after 
variable selection in stepwise multiple regression. 

Cross-validation may be applied to a large data sets which is used 
to select the model having the highest predictive ability. In cross-validation, 
many PLS runs are performed in which one or several objects are 
eliminated from the data set either randomly or in a systematic manner and 
the excluded objects are predicted by the corresponding model. This is 
called leave-one-out (LOO) technique. 

1.3.7 EVALUATION OF QSAR EQUATION 

When the number of variables exceeds three, the results cannot 
be expressed in the form of graphs or model. Therefore, a regression 
equation remains the only method of expression, which can be used 
situation. Terms commonly used in regression analysis are‘"^’‘"^- 

1. Correlation Coefficient (r) 

It is a relative measure of quality of fit of model. Its value 
depends on the overall variance of data. High value of r (r > 0.90) indicates 
that the statistical significance of the regression equation is high while low 
value of r indicates that the substituent constant is not important for the 
process under consideration. If the r value does not decrease significantly 
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when a particular substituent constant is omitted from the equation, it 
means that process represented by equation is least affected by the factor 
symbolized by that particular substituent constant. Correlation coefficients 
of the two subsets are relatively small while the correlation coefficient 
derived from the combined set is much larger, due to the increase in overall 
variance. 

2. Square of the Correlation Coefficient (R^) 

It is a measure of explained variance represented as a percentage 
value i.e. the term explains about percent data represented by that particular 
equation, for example if r = 0.7 then = 0.49 or 49% data is accounted by 
regression of that parameters, still leaving 5 1 % data yet unaccounted. Thus, 
the value of r can be improved by the inclusion of another parameters. 
Greater the value of r^, lesser is the data (verariance) that remains 
unaccounted by the equation. 

3. Standard Deviation (s) 

This value gives us an idea about the precision of that equation. 
Greater the value of ’s', larger will be the accuracy with which the expected 
potency of a new compound may be guessed. It is an absolute measure of 
quantity of fit. Its value consider the number of objects (n) and the number 
of variables (k). Therefore 's' depends not only on the quality of fit but also 
on the number of degrees of freedom (DF). 

DF = n-k-l (30) 


Larger the number of objects and smaller the number of variables 

1 

is, the smaller the standard deviation (s) for a certain value of SA". 
Normally, s should be around 0.3. 


A- JI-R-)Syy 
n-k-1 n-k~l 


-> (31) 
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4. 


Standard Error of the Coefficient 


The figure in bracket following the coefficient represents the 
standard error of the coefficient, which means that if the experiment is 
repealed, the coefficient should lie between these limits. Higher the 
standard error, less reliable is the coefficient and there is a less possibility 
that the variable is represents is related to the biological response. 

5. Number of compounds utilized (n) 

For a good correlation, large number of compound must be used. 
The value of r must be assessed with reference to n for example 1-0.89 for 
n= 1 0 is a better correlation than 1-0.98 for n=3. 


6. F value 


It is a measure of statistical significance of regression model. 
Only F values being larger than the 95% significance limit prove the 
overall significance of a regression equation. 


R-(n-k-l) 

k(l-R^) 


^( 32 ) 


It evaluates the statistical validity of a particular equation. For a 
particular equation if. Standard (i-®- 13.74) < Calculated then, the relationship 
represented by that equation is statistically significant. 
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CHAPTER-2 


BIOACTIVE COMFOUHDS 
FOR THE PRESEHT 
STUDY-AH OVERVIEW 
ESTROQEH RECEPTORS 




"Estrogens" are a family of related molecules that stimulate the 
development and maintenance of female characteristics and sexual 
reproduction. The natural estrogens produced by women are steroid 
molecules, which means that they are derived from a particular type of 
molecular skeleton containing four rings of carbon atoms, giving the shape 
shown here. 



Figure:! 

The most prevalent forms of human estrogen are estradiol and 
estrone (figure 1). Both are produced and secreted by the ovaries, although 
estrone is also made in the adrenal glands and other organs. 

Estrogens are hormones, which means that they function as 
signaling molecules. A signaling molecule exerts its effects by traveling 
through the bloodstream and interacting with cells in a variety of target 
tissues. The breast and the uterus, which play central roles in sexual 
reproduction, are two of the main targets of estrogen. In addition, estrogen 
molecules act on the brain, bone, liver, and heart. 

2.1 ESTROQEn RECEPTORS ATiD QEliE ACTIVATIOn 

Estrogens act on target tissues by binding to parts of cells called 
estrogen receptors. An estrogen receptor is a protein molecule found inside 
those cells that are targets for estrogen action. Estrogen receptors contain a 
specific site to which only estrogens (or closely related molecules) can 
bind. The target tissues affected by estrogen molecules all contain estrogen 
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receptors; other organs and tissues in the body do not. Therefore, when 
estr(')gcn molecules circulate in the bloodstream and move throughout the 
body, they exert effects only on cells that contain estrogen receptors. 
Estrogen receptors normally reside in the cell's nucleus, along with DNA 
molecules. In the absence of estrogen molecules, these estrogen receptors 
are inactive and have no influence on DNA (which contains the cell's 
genes). But when an estrogen molecule enters a cell and passes into the 
nucleus, the estrogen binds to its receptor, thereby causing the shape of the 
receptor to change (figure 2). This estrogen-receptor complex then binds to 
specific DNA sites, called estrogen response elements, which are located 
near genes- that are controlled by estrogen. After it has become attached to 
estrogen response elements in DNA, this estrogen-receptor complex binds 
to coactivator proteins and more nearby genes become active. The active 
genes produce molecules of messenger RNA, which guide the synthesis of 
specific proteins. These proteins can then influence cell behavior in 
different ways, depending on the cell type involved. 

The estrogen receptor is a ligand-modulated transcription factor 
that regulates the activity of certain genes'^^ A member of the nuclear 
hormone receptor gene superfamily, ER has a multidomain structure, with 
two conserved domains that are responsible for DNA binding on one hand, 
and ligand binding, dimerization, and transcriptional activation on the 
other. The binding of ligands to the hormone-binding domain of ER 
stablizes the interaction of the receptor with target sequences in the 
regulatory region of these genes. This binding may be either directly to 
specific DNA enhancer sequences or in some cases to API enhancers 
through the API transcription factors Fos and Jun. The activation or 
repression of these genes by the ligand receptors complex is then mediated 
by the recruitment of ER of a variety of coregulatory proteins that interact 
with components of the basal transcriptional complex and have enzymatic 
activity that alters the architecture of chromatin 
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For a compound to have an estrogenic or antiestrogenic activity 
through this complex pathway however, it must first bind to the estrogen 
receptor protein. There are two estrogen receptors, designated as estrogen 
receptor a and p (ER a and These receptor subtypes are related 

in both structure and function but they have different tissue 
distributions and somewhat different amino acid sequences in their 
ligand binding domain*'^^. Thus these two ER subtypes have somewhat 
different ligand-binding characteristics and gene-activating activity'"*^, 
although much less is known ERP than ERa. 

Because estrogens can act through different ER subtypes, and the 
ligand-ER complex can utilize different genes, a variety of different 
response elements, and in different cells, varied levels of different 
coregLilatory proteins, it is not surprising that the pharmacology of 
estrogenic compounds is complex 

Transcriptional activation is mediated by two different activation 
functions, one of which is controlled by ligand binding (AF-2). A family of 
proteins called transcriptional coactivators interact with agonist-bound 
receptors to mediate transcription. This interaction occurs through one or 
more Nuclear Receptor interaction regions, or NR boxes, which contain the 
conserved LXXLL sequence motif. The pl60 family of coactivators 
contain multiple NR boxes that recognize different NRs with varying 
affinities. For the steroid receptors, function is also controlled by the 
binding of a large chaperone complex that includes Hsp-90 to the ligand 
binding domain (LBD). Formation of this complex is apparently required 
for maintaining the receptor in a ligand binding-competent state. Upon 
ligand binding, the chaperones dissociate, allowing the receptor to bind 
DNA and regulate transcription. 

The ERa LBD can bind to pure agopist&istfdh'l^pi^todogenous 
estrogen, 17b-estradiol (E2) or the synthetic estrogen diethylsti^esterol 
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(DES), pure antagonists such as ICI- 164,384. Other compounds such as 
tamoxifen and raloxifene (RAL) act as antagonists in particular tissue and 
promoter contexts. This agonist/antagonist behavior is clinically important 
for treating cancer. Recently, breast cancer prevention trials with tamoxil'en 
showed a 45% reduction in breast cancer incidence and a decreased 
occurrence of bone fractures. However, a significant increase in incidence 
of endometrial cancer was also reported. 

We are particularly interested in understanding how ligands 
modulate transcriptional activity and the role of molecular chaperones in 
this process. This is a critical step in the rational optimization of 
compounds for the successful treatment and prevention of breast and other 
cancers. Furthermore, lessons learned from studies on the ER should be 
applicable to the broad family of NRs. As a first step in this process, we 
have recently solved the structures of the ERa LED bound to either the 
antagonist 4-hydroxy tamoxifen (OHT; the active tamoxifen metabolite) or 
to the synthetic steroid DES and a peptide from the GRIPl coactivator NR 
Box 2. This work is in collaboration with Dr. Geoffrey Greene, U. Chicago. 
These structures have been quite informative about the coupling between 
ligand binding and the functional state of AF-2. 

Previous work had indicated that although E2 and RAL bind at 
the same site within the core of the ERa LED, each of these ligands induces 
a different conformation of the last helix in the LED, helix 12. With agonist 
bound, helix 12 packs against helices 3,5/6 and 11; by contrast with 
antagonist bound the position of helix 12 is quite different - now occupying 
a hydrophobic groove constructed from helices 3 and 5. Mutagenesis has 
shown that residues in this cleft and on helix 12 form part of the AF-2 
recognition surface. In our DES complex, the NR box peptide is bound in 
an a-helical conformation by the hydrophobic groove formed from helices 
3,4,5, and 12. In the OHT complex, instead of forming part of AF-2, helix 
12 binds to, and occludes, the coactivator recognition box using an 
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LXXML motif to mimic the LXXLL from the NR box. The positioning of 
helix 12 is directed by effects on the secondary and tertiary structure of the 
LBD programmed by ligand binding. Agonists stabilize secondary 
structural elements, extending the lengths of helices 3,8 and 11. This then 
shortens the loop between helices 11 and 12, which in turn does not allow 
helix 12 to fit in the hydrophobic coactivator pocket. The precise geometry 
of the ligand and its interactions with different regions of LBD lead to a 
differential stabilization. 

2.2 ESTROQEri-iriDUCED STIMULATIOFi OF CELL 
FROLIFERATIOri 

In some target tissues, the main effect of estrogen is to cause cells 
to grow and divide, a process called cell proliferation. 

In breast tissue, for example, estrogen triggers the proliferation of 
cells lining the milk glands, thereby preparing the breast to produce milk if 
the woman should become pregnant. 

Estrogen also promotes proliferation of the cells that form the 
inner lining, or endometrium, of the uterus, thereby preparing the uterus for 
possible implantation of an embryo (figure 3). During a normal menstrual 
cycle, estrogen levels fall dramatically at the end of each cycle if pregnancy 
docs not occur. As a result, the endometrium disintegrates and is shed from 
the uterus and vagina in a bleeding process called menstruation. 

2.3 ESTROQEK: BEriEFICIAL Al^D HARMFUL EFFECTS 

Paradoxically, estrogen can be both a beneficial and a harmful 
molecule (figure-4). 

The main beneficial effects of estrogen include its roles in 
1 . Programming the breast and uterus for sexual reproduction, 
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2. Controlling cholesterol production in ways that limit the buildup 
of plaque in the coronary arteries, and 

3. Preserving bone strength by helping to maintain the proper 
balance between bone buildup and breakdown. 

The decreased production of ovarian steroids which occurs after 
the climacteric has been linked to a number of postmenopausal pathologies. 
These include osteoporosis, coronary heart disease, hot flushes and vaginal 
dyspareunia‘'‘® ‘'^^. Indeed, the clinical use of long term estrogen-based 
hormone replacement therapy (ERT) in post-menopausal women has 
proven to be highly effective method for reducing the risks associated with 
these degenerative diseases. However inspite of the fact that the positive 
effects of such long-term ERT are increasingly accepted the benefits are 
achieved at the expense of a number of negative side effects, including 
uterine bleeding, endometrial hyperplasia, endometrial cancer, and an 
increased risk of developing breast cancer. The uterine side effects, 
however, may be reduced by co-treatment with progesterone therapy. These 
negative side effects, in turn, frequently lead to a reduced patient 
compliance and reluctance to accept ERT as a treatment form''"’’. 

2.4 CANCER ARISES FROM DNA MUTATIONS IN 

CELLS 

Cancer is caused by DNA damage (i.e., mutations) in genes that 
regulate cell growth and division. 

Some mutations are inherited, while others are caused by 
exposure to radiation or to mutation-inducing chemicals such as those 
found in cigarette smoke. Mutations also can occur spontaneously as a 
result of mistakes that are made when a cell duplicates its DNA molecules 
prior to cell division (figure-5). 
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When cells acquire mutations in specific genes that control 
proliferation, such as proto-oncogenes or tumor suppressor genes, these 
changes are copied with each new generation of cells. Later, more 
mutations in these altered cells can lead to uncontrolled proliferation and 
the onset of cancer. 

2.5 ESTROQEn-iriDUCED PROLIFERATIOri OF EXISTIPiQ 
MUTATiT CELLS 

Although estrogen does not appear to directly cause the DNA 
mutations that trigger the development of human cancer, estrogen does 
stimulate cell proliferation (figure-6). 

Therefore, if one or more breast cells already possesses a DNA 
mutation that increases the risk of developing cancer, these cells will 
proliferate (along with normal breast cells) in response to estrogen 
stimulation. The result will be an increase in the total number of mutant 
cells, any of which might thereafter acquire the additional mutations that 
lead to uncontrolled proliferation and the onset of cancer. 

In other words, estrogen-induced cell production leads to an 
increase in the total number of mutant cells that exist. These cells are at 
increased risk of becoming cancerous, so the chances that cancer may 
actually develop are increased. 

2.6 ESTROGEN-INDUCED PROLIFERATION AND 
SPONTANEOUS NEW MUTATIONS 

Even in women who do not have any mutant breast cells, 
estrogen-induced proliferation of normal breast cells may still increase the 
risk of developing cancer. 

The reason involves DNA. A cell must duplicate its DNA molecules 
prior to each cell division, thereby ensuring that the two new cells resulting 

from the process of cell division each receive one complete set of DNA 
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molecules (figure-7). But the process of DNA duplication occasionally 
makes mistakes, so the resulting DNA copies may contain a small number 
of errors (i.e., mutations). If one of these spontaneous mutations occurs in a 
gene that controls cell growth and division, it could lead to the development 
of cancer. 


Proliferation of normal cells from exposure to estrogen creates a 
vulnerability to spontaneous mutations, some of which might represent a 
first step on the pathway to cancer. 

2.7 ESTROGEN AND BREAST CANCER 

During each menstrual cycle, estrogen normally triggers the 
proliferation of cells that form the inner lining of the milk glands in the 
breast. 


If pregnancy does not occur, estrogen levels fall dramatically at 
the end of each monthly menstrual cycle. In the absence of high estrogen 
levels, those milk gland cells that have proliferated in any given month will 
deteriorate and die, followed by a similar cycle of cell proliferation and cell 
death the following month. For the average woman, this means hundreds of 
cycles of breast cell division and cell death repeated over a span of roughly 
40 years, from puberty to menopause. 

2.8 ESTROGEN AND UTERINE CANCER 

In the uterus, estrogen triggers the proliferation of endometrial 
lining cells during each month of the menstrual cycle, followed by death of 
these cells during menstruation. Over a span of 40 years, from puberty to 
menopause, hundreds of cycles of cell division and cell death will occur. 

These repeated cycles of estrogen-induced cell division tend to 
increase the risk of developing cancer in the same two ways as in the 
breast: Estrogen can stimulate the division of uterine cells that already have 


[ 56 ] 


DNA mutations, and it also increases the chances of developing new, 
spontaneous mutations when estrogen stimulates cell proliferation. Whether 
the mutations are inherited or spontaneous, estrogen-driven proliferation 
will increase the number of these altered cells that can ultimately lead to the 
development of uterine cancer 

2.9 ANTIESTROGENS 

Since estrogen can promote the development of cancer in the 
breast and uterus, it seems logical to postulate that substances that block the 
action of estrogen might be helpful in preventing or treating these two types 
of cancer (figure-8). 

This rationale has led scientists to work on the development of 
"antiestrogen" drugs that can block the action of estrogens and thereby 
interfere with, or even prevent, the proliferation of breast and uterine cancer 
cells. Antiestrogens work by binding to estrogen receptors, blocking 
estrogen from binding to these receptors. This also blocks estrogen from 
activating genes for specific growth-promoting proteins. 

2.10 SELECTIVE ESTROQEn RECEPTOR MODULATORS 
(SERMs) 

In working on the development of antiestrogens, scientists have 
made a somewhat surprising discovery. Some drugs that block the action of 
estrogen in certain tissues actually can mimic the action of estrogen in other 
tissues. 


Such selectivity is made possible by the fact that the estrogen 
receptors of different target tissues vai 7 in chemical structure. These 
differences allow estrogen-like drugs to interact in different ways with the 
estrogen receptors of different tissues. Such drugs are called selective 
estrogen receptor modulators, or SERMs, because they selectively 
stimulate or inhibit the estrogen receptors of different target tissues. For 

example, a SERM might inhibit the estrogen receptor found in breast cells 
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but aclivate the estrogen receptor present in uterine endometrial cells. A 
SERM of this type would inhibit cell proliferation in breast cells, but 
stimulate the proliferation of uterine endometrial cells. 

In brief these compounds that have a mixed endocrine profile that 
affords agonistic or antagonistic activity in a tissue specific manner hold 
the promise of a safer alternative to estrogen Some members of the 
SERM class were initially called ” antiestrogens" because for their high 
affinity binding to estrogen receptors (ERs) and ability to counteract 
estrogen action. However, this nomenclature has proved inadequate to fully 
describe the actions of these agents. The most commonly studied SERMs 
are tamoxifen, raloxifene, droloxifine, tormifene, idoxifene, lower 
meloxifene, CP-366156, EM-800, GW-5638 and LY 353381 (figure-9). 
All these SERMS have tissue selective action. They are now being used for 
conditions associated with aging, hormone responsive cancer, osteoporosis, 
cardiovascular diseases and serve lipid lowering. SERMs are therefore 
currently the archtypes for a rich category of drug therapies based on a 
single molecular target, the ER. 
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2.10.1 TAMOXIFEn ATID CATICER 


The first SERM to be investigated extensively for its anticancer 
properties is a drug called tamoxifen. 

Tamoxifen blocks the action of estrogen in breast tissue. Tamoxifen 
exerts this antiestrogenic effect by binding to the estrogen receptors of 
breast cells, thereby preventing estrogen molecules from binding to these 
receptors. But unlike the normal situation, when estrogen binds to its 
receptor, the binding of tamoxifen to the receptor does not cause the 
receptor molecule to acquire the changed shape that allows it to bind to 
coactivators. As a result, the genes that stimulate cell proliferation cannot 
be activated. 

By interfering with estrogen receptors in this way, tamoxifen 
blocks the ability of estrogen to stimulate the proliferation of breast cells. 

2.10.2 TAMOXIFEPi ATID BREAST CATICER TREATMENT 

In women who have breast cancer, proliferation of the breast 
cancer cells is often driven by estrogen, just as in the case of nonnal breast 
cells. 


Since tamoxifen can block the effects of estrogen on breast cells, 
scientists predicted that breast cancer could be treated by using tamoxifen 
to interfere with estrogen-induced cell proliferation. Based on encouraging 
results obtained in experimental trials, tamoxifen was first approved for 
such use in breast cancer treatment in the 1970s. 

The first step in treating women with breast cancer is to 
surgically remove the cancer from the breast. It is difficult to be certain that 
every cancer cell has been removed at the time of surgery because some 
breast cancer cells could have spread to surrounding tissues or other organs 
prior to the operation. Therefore, women often receive some type of 
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licaiiiiciK after surgery {adjuvant therapy) to prevent the growth of any 
cancer cells that might remain in the body. Studies show that when 
lanu>\ilcn is used for this purpose, the risk of cancer recurrence is reduced 

2. 10.5 TAMOXIFEn AS A CAUSE OF UTERIWE CATICER 

Although tamoxifen has been useful both in treating breast cancer 
patients and in decreasing the risk of getting breast cancer in women at high 
risk, it also has some serious side effects. 

These side effects arise from the fact that while tamoxifen acts as 
an antiestrogen that blocks the effects of estrogen on breast cells, it mimics 
the actions of estrogen in other tissues such as the uterus. Its estrogen-like 
effects on the uterus stimulate proliferation of the uterine endometrium and 
increase the risk of uterine cancer. 

2.10.4 SEARCH FOR THE PERFECT SERM 

Because of the potential cancer and cardiovascular risks inherent 
in hormone pills containing estrogen and progesterone, scientists are 
working on the development of SERMs for postmenopausal women that 
can mimic the beneficial effects of estrogen without exerting any of its 
harmful effects. 

The ideal drug, of course, would be a SERM exhibiting the 
positive effects of estrogen on bones, heart, and blood vessels, without 
exhibiting the potentially harmful effects of estrogen on the breast and 
uterus. 

2.10.5 RALOXIFENE AND THE PREVENTION OF 

OSTEOPOROSIS 

One SERM that may exhibit some of these properties is 
raloxifene, a drug approved by the FDA in 1997 for preventing 
osteoporosis in postmenopausal women. 
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Raloxifene appears to function like estrogen in bone, acting to 
maintain bone strength and increase bone density. In addition, raloxifene 
also resembles estrogen in its ability to lower LDL cholesterol levels, 
thereby decreasing the risk of heart disease. 

Although information on the long-term risks and benefits of 
raloxifene is limited compared to tamoxifen, preliminary evidence suggests 
that raloxifene may exert these beneficial effects on bones, heart, and blood 
vessels without increasing a woman's risk of developing cancer 

2.10.6 RALOXIFENE AND THE POSSIBLE PREVENTION 
OF CANCER 

Preliminary evidenee suggests that raloxifene may actually turn 
out to be helpful in preventing cancer. 

In animal studies, raloxifene has already been shown to reduce 
the incidence of both breast and uterine cancer. And in preliminary human 
trials, raloxifene has been found to reduce the risk of breast cancer without 
the unwanted stimulation of uterine cell division that is exhibited by 
tamoxifen. 

As a result of these preliminary findings, the National Cancer 
Institute is sponsoring a human clinical study to directly compare the 
effects of tamoxifen and raloxifene in postmenopausal women. The trial, 
named STAR (Study of Tamoxifen and Raloxifene), was begun in 1999 
and will follow more than 20,000 women for a period of 5 to 10 years. 

Even if the STAR trial confirms the effectiveness of raloxifene in 
reducing the risk of breast and uterine cancer, raloxifene is still not the 
perfect drug. It does not reduce the frequency of hot flashes associated with 
menopause and, like estrogen, it increases the risk of blood clots. Just as 
tamoxifen was an important milestone, if a single SERM like raloxifene is 
found to protect women against osteoporosis, heart disease, breast cancer, 
and uterine cancer, it will represent an important milestone in women's 
health. 
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The present thesis deals with the applications of QSAR 
techniques to the following types of non-steroidal SERMs except estradiol- 
16a-carboxylic acid ester which ins an estradiol derivatiove : 

1. Pyrazole ligands 

2. Estradiol-16a-carboxylic Acid Esters 

3. Diaryl-Dialkyl-Substituted Pyrazoles. 

4. Anti-Estrogen Binding site (AEBS) 

5. 2-Aniino-4, 6-diarylpyridines 

6. cis-3, 4-Diaryl-hydroxy chromanes. 

All the above mentioned ligands are non-steroidal in nature 
except estradiol- 16a-carboxy lie acid esters which belong to the class of 
steroidal estrogens. All these ligands have high affinity and selectivity for 
estrogen receptor ligands on whom QSAR has been done is determined by 
different competitive binding affinity assay methods and have been divided 
into two scales: "Relative Binding Affinity (RBA) scale and Estrogen 
receptor (ER) binding (IC50). 

It probably does not make a great difference what species and 
target tissue is used as the source of the estrogen receptor for these binding 
studies because their is little evidence for species difference in structure 
affinity relationships, and in most of the target tissue used, the ERa subtype 
will predominate . 

All the physicochemical parameters used in the present were 
automatically loaded from DRAGON Software developed by Todeschini, 
R. and Consonni, V. et. al*^® and the QSAR regression analysis were 
executed on Compaq PC using SPSS Software version 6.0.1. 
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CHAPTER-3 


2D-QSAR 
RESULTS AUD 

Discussion 



This chapter has been divided into two sections according 
to the binding affinities of estrogen receptor ligands. In the first 
section, all the binding affinity values were placed on a common 
"relative binding affinity" (RBA) scale. The RBA values 
determinations have been done by different competitive binding 
affinity assay methods and using different receptor preparations. 
Values on this scale were calculated as a percent from the ratio of 
IC 50 values of test compounds to that of estradiol to displace 50% 
of ['H] estradiol from estrogen receptor preparations (generally 
uterine c^dosol fractions which are largely estrogen receptor-a). 
Thus on the scale, estradiol by definition has a value of 100, with 
lower affinity ligands having lower values and higher affinity 
ligands, higher values. 

In the second section the binding affinity values are 
expressed in IC 50 forms either extracted from MCF-7 cells lysate 
by competing with [^H] 17P-estradiol or in estrogen receptor rich 
cytosol derived from rabbit uterine tissue in a dextran charcoal 

159,160 

(DCC) assay 
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SECTION-A 


3.1.1 QSAR STUDIES OFI PYRAZOLE LIQATiDS- 
ESTROQEn RECEPTOR-a-SELECTIVE AQOmSTSi^i 

Compounds in this series include various tetrasubstituted 
pyrazoles as high affinity ligands for the estrogen receptor. Stauffer et al 
have reported the RBA value in purified full length human ERa and ERP 
using competitive radiometric binding assay'^^'"^°. 

In this section QSAR have been performed on tetrasubstituted 
pyrozales on both ERa and ERP subtypes using hydrophobic (M Log P), 
steric (MR and Vw) and electronic (Xeq) descriptors which have been 
described in Chapter 1. Regression analysis has been used to correlate the 
various descriptors with relative binding affinity (RBA) values. 
Table 3.1. l.A contains a set of 15 compounds showing log RBA values of 
ERa and ERP subtypes along with the data for regression analysis. 
Table 3.1. l.B and Table 3.1.1.C represents the coirelation matrix of ERa 
and ERp subtypes with the descriptors used in the present work. 

3. 1.1. A QSAR STUDY OF TETRA SUBSTITUTED PYRAZOLE 
LIOnADS BiriDINQ TO ERa SUBTYPES. 

The correlation of M Log P, MR, Vw and Xeq. with RBA ERa 
save the following simple regression (monoparameteric) equation. 


log 

RBA = 

0.372 (± 0.282) M Log P + 0.302 

— ^ 

(1) 

n = 

16 r^ = 

0.215 

SE = 0.512 F(i, 14) = 0.957 



log 

RBA = 

0.661 (i 0.314) MR + 0.817 

-> 

(2) 

n = 

16 r^ = 

: 0.183 

SE = 0.625 F(,, ,4) = 0.440 



log 

RBA = 

- 0.418 

1 (± 0.612) Vw + 0.761 

-> 

(3) 

n = 

16 r^ = 

: 0.017 

SE = 0.688 F(i, ,4) = 0.393 



log 

RBA = 

- 0.768 

1 (± 0.427) Xeq + 0.378 

— >• 

(4) 

n = 

16 r^ = 

: 0.369 

SE = 0.509 Fn. 14)= 1-630 
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I he monopaiameteric equations here shows very poor 
coiicUuions. Table 3.1.1.B shows that neither M Log P correlates will with 
MR tiiicl XcQ. noi MR coiielates with Xeq., so these parameters can be 
taken together. The correlation of M Log P and MR, M Log P and Xeq. and 
MR and Xeq. is given as follows: 

log RBA = - 0.810 (± 0.513) M Log P + 0.468 

(± 0.211) MR + 0.326 -» (5) 

n = 16 r“= 0.710 SE = 0.443 F( 2 , o) = 6.94 
log RBA = - 0.783 (± 0.443) M Log P + 0.660 

(+ 0.431) Xeq. + 0.684 (6) 

n=16 r^= 0.768 SE = 0.410 F( 2 , i3)=8.39 
log RBA = 0.572 (± 0.418) MR - 0.54 (± 0.529) 

Xeq. + 0.376 (7) 

n=16 r^= 0.472 SE = 0.510 F( 2 , i3)=1.96 

Equations (5) and (6) show significant improvement in 
correlation level. They account for 71% and 77% variance ratio (R = 0.84 
and R = 0.88) and F- value is also significant at 95% confidence interval. 
Equation (7) however cannot be accepted from statistical point of view. 
The error terms in this equation are quite high and the F-value is not 
significant at 95% confidence interval. Now in an attempt to improve the 
degree of correlation of equations (5) and (6), the introduction of an 
indicator parameter (Ind) for substituents having isobutyl group gave fairly 
good improvement in correlations: 

log RBA = -0.932 (± 0.476) M Log P + 0.511 (± 0.310) 
MR 0.174 (± 0.043) Ind -h 0.528 -^(9) 

n=16 r^= 0.796 SE = 0.462 F^, 12 ) = 7.886 

log RBA = - 0.872 (± 0.342) M Log P - 0.83 (0.410) 

Xeq + 0.233 (± 0.008) Ind +0.382 ^(10) 

n = 16 r^= 0.86-5 SE = 0.318 F( 3 , , 2 ) = 1 1 .204 

Equation (10) accounts for 87% variance ratio (R = 0.93) and F- 

value is significant at 99% confidence level. 


[ 68 ] 


3.1.1.BQSAR STUDY OF TETRASUBSTITUTED FYRAZOLE 
LIGAMDS BINDINQ TO ERp SUBTYPE. 

The correlation of M Log P, MR, Vw and Xeq with log RBA 
save simple linear regression equation as: 

log RBA = - 0.424 (± 0.317) M Log P + 0.243 -> (11) 

n=16 r-= 0.281 SE = 0.510 F(,, ,4, =0.865 

log RBA = 0.48 (± 0.382) MR + 0.776 ^ (12) 

n=16 r“= 0.136 SE = 0.622 F(,, u) = 0.648 

log RBA = 0.634 (+ 0.521) Vw - 0.938 ^ (13) 

n=16 r-= 0.004 SE = 0.821 F(i, ,4) = 0.402 

log RBA = - 0.245 (± 0.118) Xeq + 0.239 -> (14) 

11 =16 r“= 0.189 SE = 0.712 F{i_i 4) = 0.461 

The regression analysis shows very poor correlation with all the 
four parameters (equations 11, 12, 13 and 14). From Table 3.I.I.C. good 
autocorrelation is observed between M Log P and Vw, M Log P and Xeq, 
MR and Vw and Vw and Xeq so these descriptors cannot be taken together 
in multiple regression analysis. However M Log P and MR and MR and 
Xeq do not show any autocorrelation so these parameters have been taken 
tosether and MRA performed shows significant improvement in 

correlation. 

log RBA = - 0.830 (± 0.511) M Log P + 0.66 

(± 0.51 1) MR + 0.268 ->(16) 

n = 16 r2= 0.588 SE = 0.416 F(2. 13) = 0.87 
log RBA = 0.512 (+ 0.396) MR - 0.410 

(± 0.245) Xeq + 0.211 ->(17) 

n=16 r2= 0.402 SE = 0.511 F(2, i3) = 4.662 
Introducing the same indicator parameter, Ind, that was used in 
ERa subtype in equations (16) and (17) statistically significant coirelations 


were obtained. 




log RBA = - 0.741 (± 0.45) M Log P + 0.762 (± 0.410) 
MR + 0.41 (± 0.23) Ind + 0.723 ->(18) 

n=16 r^= 0.786 SE = 0.376 F^, , 2 ) = 9.832 

log RBA = 0.632 (± 0.307) MR - 0.397 (± 0.197) 

Xeq + 0.33 (± 0.201) Ind + 0.510 ->(19) 

n = 16 r= 0.713 SE=: 0.504 , 2 ) = 7.314 

However, with a hope of obtaining still better results, compound 

(16) having the highest residual value from equation (18) was taken as an 

outlier and the following regression equation has been obtained: 

logRBA = - 0.543 (± 0.327) M Log P + 0.821 (± 0.372) 
MR + 0.412 (± 0.212) Ind + 0.417 ^(20) 

n=15 r-= 0.845 SE = 0.301 i 2 ) = 1 1 -46 

Equation (20) accounts for 85% variance ratio (R = 0.92) and F- 
value is significant at 99% confidence interval. 
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CORRELATION MATRIX 


TABLE: 3.1. 1.B 

(FOR ERa SUBTYPE) 




MR 

Vw 

Xeq 

M Log P 

1.000 




MR 

0.448 

1.000 



Vw 

0.516 

0.671 

1.000 


Xeq 

0.146 

0.230 

0.637 

1.000 


TABLE: 3.1.1.C 


(FOR ERp SUBTYPE) 
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3.1.2 QSAR STUDIES ON ESTRADIOL- 1 6 a-CARBOXYLIC 
ACID ESTERS AS LOCALLY ACTIVE ESTROGENS 

Compounds in this series include various analogues of estradiaol 
acting as locally active estrogens without significant systemic action useful 
for tlie therapeutic treatment of vaginal dyspareunia of menopause in 
women for whom systemic estrogens are contraindicated The RBA 
values were studied using the classical assay with rat uterine cytosol (ERa). 
In this section QSAR have been performed on estradiol- 16a-carboxylic 
acid esters. The relative binding affinity data was correlated with 
hydrophobic (M Log P), steric (MR and Vw) and electronic (Xeq) 
descriptors. 

A complete set of molecular descriptors, namely: M Log P, MR, 
Vw, Xeq, indicator parameter for a set of 16 compounds along with the log 
RBA values are recorded in Table 3.1.2.A and Table 3.1.2.B represents the 
correlation matrix between the parameters. Correlation matrix indicates that 
their is very good autocorrelation between M Log P and Vw, MR and Xeq 
and Vw and Xeq, that is why these parameters cannot be taken together in 
multiple regression analysis (MRA). 

On applying simple linear regression analysis, the correlation of 
relative binding affinity data with M Log P, MR, Xeq and Vw, gave simple 
linear equation as: 


log 

RBA = 

- 0.189 (± 0.069) M Log P + 2.155 


(1) 

n = 

16 r- = 

0.401 

SE = 0.38 F(i, 14) = 5.20 



log 

RBA = 

0.614 

(± 0.332) MR + 0.487 


(2) 

n = 

16 r^ = 

0.289 

SE = 0.732 F(i, 14)= 0.013 



log 

RBA = 

0.671 

(+ 0.326) Vw + 1.286 

— ^ 

(3) 

n = 

16 r^ = 

0.321 

SE = 0.501 F(i, 14) = 1-281 



log 

RBA = 

- 0.298 (± 0.836) Xeq - 1.120 


(4) 

n = 

16 r^ = 

0.129 

SE = 0.672 F(i, 14) = 0.761 
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Equation (1), (2), (3) and (4) show very poor correlation. Table 
3.1.2.B shows that their is neither any correlation between M Log P and 
MR nor between M Log P and Xeq, so these parameters can be taken 
together. The correlation of RBA value with M Log P and MR is given as: 

log RBA = - 1.272 (± 0.529) M Log P + 0.921 

(± 0.637)MR + 1.216 -4(5) 

n = 16 r- = 0.768 SE = 0.314 F( 2 , 13 ) = 19.27 

Equation (5) accounts for 76% variance ratio (R = 0.876) and F- 
value is significant at 99% confidence interval. Similarly, the correlation of 
RBA value with M Log P and Xeq gave the following regression equation: 

log RBA = - 0.922 (± 0.646) M Log P - 0.733 

(± 0.396)Xeq + 0.839 ( 6 ) 

n=16 r'= 0.614 SE = 0.483 F( 2 , i 3 )= 8-10 
Equation ( 6 ) accounts for 61% variance ratio (R = 0.784) and 
F-value is significant at 95% level. Comparing equation (5) and ( 6 ), 
equation (5) seems more statistically sound. In order to improve the degree 
of correlation an indicator parameter, Ind has been used. Indicator 
parameter has been taken for estradiol having neo-pentyl estei, the value 
for which is taken as unity for the presence of this ester and for all other 
compounds where this ester is absent, it is taken as zero. 

MRA gave the following triparametric model which shows a 
significant improvement in correlation as follows: 

log RBA = 1.272 (± 0.429) M Log P + 0.821 (± 0.437) 

MR + 0.537 (± 0.311) Ind + 1.889 ^(7) 

n=16 r^= 0.910 SE = 0.226 F^. lO) = 18-33 

Equation (7) accounts for 91% variance ratio (R = 0.954) and F- 
value is significant at 99% confidence interval. 
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CORRELATION MATRIX 


M Log P 

1.000 


0.262 


0.765 


TABLE: 3.1. 2.B 
MR 


1.000 

0.449 


Vw 


1.000 


0.370 


0.994 


0.839 















3.1.3 QSAR STUDIES ON DIARYL-DIALKYL-SUBSTITUTED 

PYRAZOLESis^ 

Nishiguchi et reported regioselective synthesis and binding 
affinity data of diaryl dialkyl-substituted pyrazoles for the estrogen 
receptor. Compounds of this class have been found to be useful for 
menopausal hormone replacement and for the prevention and treatment of 
breast cancer’ The binding affinity for each pyrazole phenol for human 
estrogen receptors (ERa and ERP) was determined using a competitive 
radiometric assay; affinities were expressed relative to that of estradiol, to 
give relative binding affinity (RBA) values’'^^. In this discussion QSAR 
have been performed on a series of substituted pyrazoles using hydrophobic 
(M Log P), steric (MR and Vw) and electronic (Xeq) descriptors. 

Table 3.1.3,A lists RBA, M Log P, MR, Vw and Xeq values for a 
set of 8 substituted pyrazoles for both ERa and ERp subtypes used in the 
present study. Table 3.1.3.B and Table 3.1.3.C represents the con-elation 
matrices between the parameters for ERa and ERP subtypes. 

3.1.3.AQSAR STUDY OF SUBSTITUTED PYRAZOLE 
BinDIFiQ WITH ERa AHD ERP SUBTYPE ' 

Simple linear regression analysis using these physicochemical 
parameters independently resulted in the following regression equations. 


log 

RBA 

= - 0.739 (± 

0.437) 

M Log P + 1.288 

— ^ 

(1) 

n = 

8 r^ = 

: 0.528 

SE = 

0.336 

F(1.6) 

= 3.035 



log 

RBA 

= - 0.: 

116 (± 

0.018) 

MR + 

0.972 


(2) 

n = 

8 r^ = 

: 0.768 

SE = 

0.418 

F(I. 6) 

= 12.234 



log 

RBA 

= - 0.820 (+ 

0.512) 

Vw + 

1.011 

— > 

(3) 

n = 

8 r^ = 

: 0.048 

SE = 

0.821 

F(1. 6) 

= 0.326 



log 

RBA 

= - 1.; 

114 (± 

1.702) 

Xeq + 0.327 

— ^ 

(4) 

n = 

8 r^ = 

: 0.118 

SE = 

0.686 

F(1.6) = 

= 0.214 
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Initial study has shown that the steric descriptor, M Log P and 
MR save statistically significant correlations while the remaining 
descriptors gave poor correlations. Table 3.1.3.B shows their is no 
autocorrelation between MR and M Log P, so these descriptors can be 
taken together. 

log RBA = - 0.84 (± 0.52) M Log P + 1.162 (± 0.612) 
MR + 0.461 -^(5) 

n = 8 r-= 0.848 SE = 0.378 F( 2 , 5 )= 18-99 

Equation (5) shows improvement in ^ value and F-value is also 
significant at 99% confidence interval. 

However, with a hope of obtaining still better results, an indicator 
parameter, Ind, for the substituent having CeHn group was used with MR. 
As a consequence excellent correlation was observed. 

log RBA = - 0.887 (+ 0.310) MR + L018 (± 0.602) 
Ind + 0.761 

n = 8 r-= 0.926 SE = 0.284 F( 2 , 5 )= 26.33 

Equation (6) accounts for 93% variance ratio (R = 0.962) and F- 
value is significant at 99% confidence interval also the error term has 

become quite low. 

3.1.3.BQSAR STUDY OF SUBSTITUTED FYRAZOLES WITH 
ERP SUBTYPES 

The correlation of RBA for ER|3 subtype with hydrophobic, steric 
and electronic parameters gave the following regression equations. 

log RBA = - 0.814 (± 0.562) M Log P+ 1.011 ^(7) 

n = 8 rW 0.324 SE = 0.378 F„,6)= 2.954 
log RBA = - 0.446 (+ 0.274) MR + 0.837 -^(8) 

n = 8 rW 0.505 SE = 0.481 F(i,6)= 5.077 
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^( 9 ) 


log RBA = - 0.488 (± 0.502) Vw + 0.917 
n=8 r= 0.113 SE = 0.669 F(,,6)= 0.310 
log RBA = - 0.217 (± 0.228) Xeq + 1.119 ->(10) 

n = 8 r^= 0.032 SE = 0.811 F(i,6)= 0.332 


Equations (9) and (10) show very poor correlations and (7) and 
(8) are not very statistically significant either. However introduction of 
square terms of MR improved the correlation level. 

Equation (11) shows an abrupt increase in correlation level as 
compared to equation (8) (R = 0.712 to R = 0.886). This indicates that MR 
shows parabolic relation with RBA values of ER|3 subtypes, Vw and Xeq 
do not show any significant improvement in correlation level on 
introduction of square terms. M Log P however shows a very (modulate) 
slight improvement in correlation level. 

loci RBA = - 0.882 (± 0.344) M Log P + 0.601 (± 0.446) 
(M Log P)^- 1.022 

n = 8 r“= 0.501 SE = 0.473 F( 2 , 5 ) = 2.392 


An order to improve the degree of correlation same indicator 
parameter which was used in ERa subtype was used with MR and the 
following regression equation was obtained: 


log RBA = 0.761 (± 0.542) MR + 0.936 
Ind + 0.843 

n=8 r^= 0.721 SE = 0.396 F( 2 . 5 )= 9.311 


(+ 0.623) 
— > (1^) 


Here in equation (13) as compared to equation (6) only moderate 
improvement in correlation level is observed. Equation (13) accounts for 
72% variance ratio (R = 0.85) and the F-value is significant at 95% 
confidence level. From Table 3.3.C there is no autocorrelation between M 
Log P and MR, so these descriptors can be taken together. Significant 

• r%.irx J ^ 

in mfrelatioii 





log RBA = - 0.833 (± 0.326) M Log P + 0.794 (± 0.310) 
MR + 0.831 -»(14) 

n = 8 r^ = 0.886 SE = 0.297 F( 2 , 5 ) = 21 .033 

Equation (14) accounts for 89% variance ratio (R = 0.94) and F- 
value is significant at 99% confidence interval. 
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TABLE- 3 . 1 . 3 . A : PHYSICOCHEMICAL DATA FOR DIARYL-DIALKYL-SUBSTITUTED PYRAZOLES 
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0 56 -0 25 
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CORRELATION MATRIX 

TABLE; 3.1. 3.B 

(FOR ERa SUBTYPE) 



Wi'iPTM 

MR 

Vw 

Xeq 

M Log P 

1.000 




MR 

0.112 

1.000 



Vw 

0.470 

0.361 

1.000 


Xeq 

0.832 

0.752 

0.681 

1.000 


TABLE: 3.1.1.C 


(FOR ERP SUBTYPE) 
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CONCLUSIONS 

Thus from the regression analysis data of the three series (3.1.1, 
1 .2. r>.1.3) for lelative binding affinity values in the present thesis we see 
that h>diophobic parameter M Log P and MR plays a crucial role in 
enhancing the RBA values of the estrogen receptor. Besides electronic 
paiametei, Xeq, also affects the binding affinity to some extent. Vw, 
howe\'ei showed poor correlations in all the three series, it means Vw does 
not plays any significant role in enhancing the binding affinities in the 
series undertaken in performing QSAR. 

The large negative coefficient of M Log P indicates, that highly 
hydrophobic substituents will lower the binding affinity in this class of ERs 
how'ever positive coefficient of MR indicates that sterically bulky 
substituents will raise the binding affinity. Since the relationship with MR 
is linear, this implies that the receptor has some flexibility at this site. It can 
very well be inferred that some component of hydrophobicity is embodied 
in the MR term. 

The negative coefficient of electronic parameter, Xeq equation 
(10) and equation (11) of series 3. LI, suggests that electron donating 
groups are favourable for relative binding affinity values. It is of note that 
the X-ray crystal structure of the receptor shows that there is some very 
hydrophobic space above the B-ring of the ligand at position ll‘^^, 
presumably sufficient to accommodate groups of moderate size without 
larger substituents would require some movement of ligand or receptor for 
a complex to form. It can be generalized from the set of congeners studied 
in this section that the negative hydrophobic and positive steric effect is 
operative here. However, the fact that correlation taken with indicator 
parameter here is small and positive, which indicates that substituents 
chosen as indicator parameter will increase the relative binding affinities 
markedly. 
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It is gratifying for estradiol- 16a-carboxylic acid esters we see at 
the 1 1 (3-position the same hydrophobic preference (positive correlation 
with MR). The non-polar neo-pentyl ester (positive indicator parameter) 
enhances binding affinity. There is no evidence for significant substituent 
effects at position 16a, although the size range of 16a substituent in this 
series (3.1.2) is much smaller. Gontcher et al*^^ studied the relative binding 
affinities of estradiol derivatives with multiple substituents at 2- ,4-, 7a-, 
11(3- and 17a-positions with estrogen receptor in the cytosol of mouse 
mammary epithelial cells at 0°C. They did 3D-analysis with CoMFA, 
which revealed the importance of electrostatic and steric fields. 


In the case of pyrazole ligands (series 3.1.1 and 3.1.3) the 
substituents that were on the phenolic ring were minutely examined 
although the pyrazole nucleus also was not neglected. As expected foi 
aromatic substituents for both ERa and ER(3 subtypes, steric bulk interferes 
with binding. However it seems that electron-donating groups will increase 
the electron density on the phenyl ring, but will also make the phenolic 
hydroxyl group less acidic. Thus, the increased affinity of the derivatives 
with electron donating groups could be due to . 


An increased electron transfer interaction between receptor and ligands or 


(b) An affect on the electron density on the OH. 

From the crystal structure of the estrogen receptor ligand binding 
domain, both affects are reasonable'"^ as the A-ring of estradiol is tightly 
surrounded by residues and the phenolic hydroxyl group donates one 
hydrogen bond (which would be weakened by the increased electron 

density) but accepts two hydrogen bonds. 


In brief, from equations (9), (10), (18), (19), (20), of series 3.1.1 
(7) of series 3.1.2 and (5), (6), (13), (14) of series 3.1.3 we see that estrogen 
receptor has a limited tolerance to hydrophobic effects and a vast tolerance 
.mnn effects of the substituents. The receptor site has a hydrophobic 


binding domain. 



SECTION-B 


3.2.1 QSAR STUDIES OF NEW LIGANDS FOR THE 
MICROSOMAL ANTI-ESTROGEN BINDING SITE 

(AEBS)169 

Compounds in this series include diphenyl methane derivative, 
N, N-diethyl-2-[(4-phenyl-methyl)-phenoxy]-ethanamine, HCl (DPPE). 
These new compounds have no affinity for the estrogen receptor and bind 
with various affinity to the anti-estrogen binding site These 

compound have been shown to display potential clinical value as they are 
cytotoxic against tumor cells‘^^''^'^ and display antiviral activities'^^ Poirot 
et have evaluated the binding affinity for the ER extracted from 
MCF-7 cells by competing with [^H] estradiol, and for binding to rat liver 
microsomal AEBS by competing with [^H] tamoxifen. 

In the present study QSAR has been performed on the various 
ligands for the microsomal on the AEBS. The binding affinities values for 
AEBS expressed as log Ki was correlated with hydrophobic (M Log P), 
steric (MR and Vw) and electronic (Xeq) descriptors. Table 3.2.1.A lists 
the pki values along with the physicochemical data for a set of 36 
compounds for regression analysis. The correlation between the 
parameters, which are used in the present work, are given in Table 3.2.I.B. 

Correlation of hydrophic (M Log P), steric (MR and Vw) and 
electronic (Xeq) descriptors with the binding affinity gave the following 
simple linear regression equations; 

- log ki = 0.568 (±0.325) M Log P + 0.461 (1) 

n = 36 r^ = 0.559 SE = 0.472 F(i, 34 ) = 9.733 

- log ki = 0.731 (±0.412) MR + 0.237 (2) 

n = 36 r^ = 0.596 SE = 0.412 F(i, 34 ) = 1 1-641 

- log ki = 0.412 (±0.246) Vw -f 0.210 

n ^ 36 r^ = 0.628 SE = 0.430 Fn. 34> = 16.322 
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— > ( 3 ) 



-> ( 4 ) 


- log ki = 0.578 (±0.836) Xeq - 0.366 
n = 36 r = 0.196 SE = 0.661 F(i, 34, = 0.3 10 
M Log P, MR and Vw gave fairly good monoparametric 
resression equations. Equations (1), (2) and (3) corresponding to these 
parameters accounts for 56%, 60% and 63% variance ratio (R = 0.75, R = 
0.77 & R = 0.79) F-values are also significant at 95% confidence level. 
Correlation of pki with Xeq, however, gave very poor regression equation. 
Equation (4) is statistically insignificant, having very low r^ - value, high 
SE-value and low F-value. The coefficient of Xeq has very large value of 

error. 

It may be noticed from Table 3.2.1.B that no autocorrelation 
exists between M Log P, MR and Vw so these parameters can be taken 
together. Performing MRA with M log P and MR gave the following 

equations. 

- log ki = 0.560 (+0.312) M Log P + 0.513 (±0.296) 
MR + 0.440 

n = 36 r^ = 0.673 SE = 0.412 F( 2 , >=12.312 


Equation (5) accounts for 67% (R = 0.82) variance ratio and F- 
value is significant at 95% level. Incorporation of Vw descriptor in 
equation (5) led to significant improvement in r^ value. 


- log ki = 


n = 36 r" 


0.612 (±0.313) M Log P + 0.482 
MR + 0.416 (±0.200) Vw + 0.332 

= 0.757 SE = 0.410 F^, >=18.276 


(±0.311) 
^ (6) 


In our attempt to increase the conrelation to a still higher value an 
indicator parameter Ind was used for the substituents having -N (QH,) O 
group at the R position. A significant improvement in the r value was 

observed. 
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- log ki = 0.733 (±0.322) M Log P + 0.551 (±0.243) 
MR + 0.482 (±0.220) Vw - 0.182 (±0.220) 
Ind± 0.317 -^(7) 

n = 36 r = 0.878 SE = 0.326 F( 4 , , = 28.340 

Equation (7) is sound in statistical parlance. It accounts for 88% 

variance ratio and the F-value is significant at 99% confidence level. 
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CORRELATION MATRIX 


TABLE: 3.2.1 .B 














3.2.2 QSAR STUDIES OK 2-AMIKO-4,6-DIARYLPYRIDIKES 
AS KOVEL LIQAKDS FOR THE ESTROQEK 

RECEPTOR176. 

Compounds in this series include 2-Amino-4, 6-diarylpyridines as 
novel ligands for the estrogen receptor effective in the treatment for both 
menopausal symptoms and for the prevention and management of 
postmenopausal osteoporosis. These ligands are developed as structurally 
novel templates which are readily amenable to parallel synthesis. 
Henke et. al'^*^ have evaluated the binding affinity of ERa and ERp 
subtypes via a scintillation proximity assay (SPA) using a bacterial lysate 
containing over expressed GST-h ERa or GST-hERp ligands binding 
domain. 

The binding affinity data expressed as - log ki was correlated 
with hydrophobic (M log P), steric (MR and Vw) and electronic (Xeq) 
descriptors. Table 3.2.2.A lists the pki values along with the 
physicochemical data for a set of 16 substituted diarylpyridines for both 
ERa and ERp subtypes for regression analysis. The correlation between the 
parameters for ERa and ERP sybtypes used in the present study is recorded 
in Table 3.2.2.B and Table 3.2.2.C. 

3.2.2.AQSAR STUDY OF SUBSTITUTED DIARYLPYRIDIKES 
BIKDIKQ TO ERa SUBTYPES. 

Simple linear regression analysis of M Log P, MR, Vw and Xeq 
with -log ki ERa gave the following correlations: 

- log ki = 0.943 (±0.428) M Log P - 0.853 -> (D 

n=16 r^ = 0.571 SE = 0.303 F(i, u) = 4.440 

- log ki = 0.832 (±0.414) MR + 0.541 ^ (2) 

n=16 r2 = 0.326 SE = 0.573 F(,. u) = 1-680 

- log ki = 0.636 (±0.312) Vw -f 0.394 ^ (3) 

n=16 r^ = 0.042 SE = 6.930 F(i,i 4 ) = 0.731 



-> ( 4 ) 


- log ki = 0.414 (±0.208) Xeq + 0.611 

n = 16 r = 0.003 SE = 8.720 F(i, , 4 ) = 0.050 

The monoparametric equations (3) and (4) show very poor 
correlations. Equations (1) and (2) however can be accepted from statistical 
point of view. Their F-values are significant at 90% confidence interval. It 
may be noticed from Table 3.2.2.B that no autocorrelation exists between 
M log P and MR so these descriptors can be used to perform multiple 
regression analysis. 

- log ki = 0.860 (±0.424) M Log P + 0.697 (±0.301) MR - 

(0.819) (5) 

n = 16 r- = 0.709 SE = 0.316 F( 2 , b) = 11.680 

Equation (5) is a fairly good regression equation. It accounts for 
71% variance ratio (R = 0.84) and the F- value is also significant at 99% 
confidence level. 

In our ongoing efforts to improve the correlation to a still higher 
level an indicator parameter, Ind, for the number of acceptor atoms for H - 
bonds (N, O, F) was taken. It was considered one when the number of 
acceptor atoms for H-bonds was three and zero in the remaining cases. 
Significant improvement in correlation was observed on incorporating, Ind, in 
equation ( 5 ) which is shown below: 

- log ki = 0.863 (±0.332) M Log P ± 0.419 (±0.198) 

MR + 0.102 (±0.007) Ind + 0.199 -^( 6 ) 

n=16 r- = 0.922 SE= 0.213 F^. 12) = 24.780 

Equation ( 6 ) is excellent in statistical parlance. It accounts for 
92% variance ratio (R = 0.96) and F-value is significant at 99% confidence 

interval. 
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3.2.2.BQSAR STUDY OF SUBSTITUTED DIARYLPYRIDiriES 
BINDiriQ TO ERp SUBTYPE. 

The correlation of the binding affinity for ERP subtype with 
hydrophobic, steric and electronic descriptors gave the following regression 


equations; 

- log ki = 0.516 (±0.268) M Log P - 0.362 (7) 

n = 16 r“ = 0.374 SE = 0.512 F(,, u) = 1.769 

- log ki = 0.648 (±0.581) MR ± 0.346 -> ( 8 ) 

n = 16 r- = 0.108 SE = 0.602 F(,, , 4 ) = 0.321 

- log ki = 0.521 (±0.411) Vw + 0.414 -> (9) 

n = 16 r“ = 0.116 SE = 0.577 F(i, , 4 ) = 0.365 

- log ki = 0.662 (±0.683) Xeq + 0.312 -> (10) 

n = 16 r^ = 0.008 SE = 7.261 F(i, 14 ) = 0.041 


The above monoparametric equations shows very poor 
correlations with the binding affinity values. From the correlation matrix 
(Table 3.2.2.C) it was observed that all the descriptors showed high value 
of autocorrelations amongst themselves except M Log P and Vw wherein 
no autocon-elation was seen. Hence MRA was performed taking these two 
descriptors together, and the following correlation was obtained 

-logki = 0.761 (+0.S02)MLogP + 0.613(H).343) Vw + 0.187 -> (11) 
n = 16 r^ = 0.774 SE = 0.512 F, 2 , is) = 8-60' 

Equation (11) accounts for 77% variance ratio and F-value is 
significant at 95% confidence level. Although equation (11) is satisfying 
from statistical point of view their exists a lacunae in it that the error terms 
in this equation is comparatively higher. In order to remove this lacunae 
and improve the degree of correlation an indicator parameter, Ind-1 for the 
number of rotable bonds was taken. It was assigned the value- 1 when the 
number of rotable bonds was equal to or greater than five (5) and zero m 
other cases. A significant improvement in correlation was observed: 
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- log ki = 0.431 (±0.120) M Log P + 0.368 (±0.107) 
Vw + 0.132 (±0.004) Indl + 0.214 -^(12) 

n =16 r = 0.848 SE = 0.322 F( 3 , i 2 ) = 16.330 

Equation (12) accounts for 85% variance ratio (R = 0.92) and F- 

value is significant at 99% confidence level. This equation is very well 

acceptable from statistical point of view. 
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CORRELATION MATRIX 

TABLE: 3.2.2.B 
(FOR ERa SUBTYPE) 



CORRELATION MATRIX 

TABLE: 3.2.2.C 
























3.2.3 QSAR STUDIES ON CIS-3, 4-DIARYL-HYDROXY 
CHROMANES AS HIGH AFFINITY PARTIAL 

AGONISTS FOR ESTROGEN RECEPTOR177. 

Compounds in this series include, the cis-3, 4-diary 1-hydroxy- 
chromanes, a new group of non-steroidal partial estrogens effective in the 
treatment of postmenopausal degenerative diseases particularly 
osteoporosis and coronary heart disease. The IC 50 binding affinities of the 
chromanes to the estrogen receptor were determined by Bury et al"’ by 
measuring their ability to compete with [^H]-17P-estradiol for receptor 
binding in ER-rich cytosol derived from rabbit uterine tissue in dextran 
coaled charcoal (DCC) assay In this discussion QSAR have been 

performed on the racemate and enantiomeric forms of substituted hydroxy 
chromanes using hydrophobic (M Log P), steric (MR and Vw) and 
electronic (Xeq) descriptors. 

Table 3.2.3.A lists the log IC 50 values along with the 
physicochemical data for a set of 16 substituted racemate and enantiomeric 
hydroxy chromanes. Table 3.2.3.B and Table 3.2.3. C represents the 
correlation matrices between the descriptors used in the present study for 
racemate and enantiomeric hydroxy chromanes. 

3.2.3.AQSAR STUDY OF RACEMATE FORM OF 
SUBSTITUTED HYDROXY CHROMANES 

Simple linear regression analysis of M Log P, MR, Vw and Xeq 
with log IC 50 gave the following correlations: 


- log IC50 = 0.926 (+0.731) M Log P + 0.396 -+ (1) 

n=16 r“ = 0.282 SE = 0.552 F(i, , 4 ) = 0.461 

- log IC50 = 0.822 (±0.631) MR + 0.712 -» ( 2 ) 

n=16 r^ = 0.116 SE = 0.612 F(,, u) = 0.417 

- log IC50 = 0.722 (+0524) Xeq - 0.315 -> (3) 

n=16 r^ = 0.327 SE = 0.414 F(,, , 4 ) = 0.826 

- log IC50 = 0.811 (±0.674) Vw + 0.540 ->• (4) 

n=16 r^ = 0.045 SE = 0.686 F(i, u) = 0.181 
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Equation ( 1 ), (2), (3) and (4) show very poor correlation with the 
binding affinity. After checking the correlation matrix (Table 3.2.3.B) it 
was observed that no autocorrelation exists between M Log P and MR and 
also M Log P and Xeq. so these parameters were taken together and 
multiple regression analysis was performed which yielded the following 
regression equations: 

- log IC 50 = 0.932 (±0.814) M Log P + 0.674 (±0.696) 

MR + 0.611 ^( 5 ) 

n=16 r^ = 0.374 SE = 0.482 F( 2 , 13 ) = 0.621 

- log IC 50 = 1.011 (±0.622) M Log P - 0.321 (±0.131) 

Xeq + 0.410 -> ( 6 ) 

n = 16 r" = 0.634 SE = 0.426 F( 2 , , 3 ) = 5.440 
Significant improvement in correlation was observed when an 
indicator parameter, Ind was introduced in equation ( 6 ). Indicator 
parameter has been taken for meta substituents at G position the value for 
which is taken as unity if the substituent is attached at the meta position of 
the aromatic ring and zero in all the other cases. 

- log IC 50 = 0.523 (±0.571) M Log P - 0.51 1(±0. 312) Xeq + 

0.230 + 0.311 (±0.102) Ind + 0.414 -> (7) 

n = 16 r = 0.870 SE = 0.246 F( 3 , 12 ) = 20.450 
Equation (7) accounts for 87% variance ratio (R = 0.93) and F- 
value is also significant at 99% confidence interval. 

3.2.5.BQSAR STUDY OF EriANTIOMERIC FORM OF 
SUBSTITUTED flYDROXY CHROMAMES 

Simple regression analysis using M log P, MR, Xeq and Vw with 
- log IC 50 resulted in the following regression equations : 

-log IC 50 = 0.81.2 (± 0.664) M log P + 0.414 -> ( 8 ) 

n = 16 r^ = 0.252 SE = 0.447 F(,, , 4 ) = 0.474 
-log IC 50 = 0.761 (± 0.510) MR + 0.556 -+ (9) 

n = 16 r^ = 0.198 SE = 0.612 F(i, h) = 0.248 
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-> ( 10 ) 


-log IC 50 = 0.861 (± 0.662) Xeq - 0.532 
n = 16 r = 0.246 SE = 0.510 F(,, , 4 ) = 0.326 
-log IC50 = 0.961 (± 0.772) Vw - 0.661 
n = 16 r = 0.046 SE = 0.718 F(,, , 4 ) = 0.016 

Fiom equation ( 8 ), (9), (10) and ( 11 ) it is observed that the linear 
legiession gave very poor correlations. After checking the correlation 
matiix (Table 3.2.3.C) it can be inferred that no autocorrelation exists 
between M Log P and MR and M Log P and Xeq. So these parameters can 
be taken together. On applying multiple regression analysis, following 
regression equation has been obtained : 

- log IC50 = 0.574 (±0.221) M Log P + 0.592 (±0.331) 

MR + 0.772 -^( 12 ) 

n = 16 r- = 0.515 SE = 0.441 F( 2 . 13 ) = 2.316 

- log IC50 = 0.769 (±0.513) M Log P - 0.661 (±0.512) 

Xeq + 0.512 -» (13) 

n=16 r^ = 0.447 SE = 0.417 F( 2 , 13 ) = 1 .761 

Moderate improvement in P' values is observed in equations ( 12 ) 
and (13). These equation however are not very sound in statistical parlance 
as they have high standard error values and F-values are significant at 90% 
confidence level. In an attempt to obtain a statistically sound regression 
equation an indicator parameter, Ind 1 , denoting the presence of fluorine 
atom was introduced. As a consequence excellent improvement in 
correlations was observed which is depicted by the following equations : 

- log IC50 = 0.612 (±0.226) M Log P + 0.586 (± 0 . 210 ) 

MR + 0.018 (± 0.007) Ind 1 + 0.310 ^ (14) 

n=16 r^ = 0.836 SE = 0.263 F^, 12 ) = 18.330 

- log IC50 = 0.712 (±0.288) M Log P - 0.513 (± 0.310) 

Xeq + 0.197 (± 0.009) Ind 1 + 0.312-> (15) 

n=16 r^ = 0.763 SE = 0.301 F( 3 , 12 ) = 1 1 -760 
Equations (14) and (15) account for 84% and 76% variance ratio 
(R = 0.91 and R = 0.87) and F-value is also significant at 99% confidence 
interval. 
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CORRELATION MATRIX 
TABLE: 3.2.3.B 
RACEMATE FORM 



HEBSEHI 

MR 

Vw 

Xeq 

M Log P 

1.000 




MR 

0.482 

1.000 



Vw 

0.747 

0.661 

1.000 


Xeq 

0.226 

0.559 

0.881 

1.000 


TABLE: 2.2.3.B 
ENANTIOMERIC FORM 




MR 

Vw 

Xeq 

M Log P 

1.000 




MR 

0.316 

1.000 



Vw 

0.592 

0.404 

1.000 


Xeq 

0.722 

0.869 

0.881 

1.000 
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CONCLUSIONS 

Fiom the regression analysis data of the various series (3.2.1, 
3.2.2, 3.2.3) studied in this section for the binding affinity of estrogen 
receptors expressed in IC 50 (nM) forms, it is observed that hydrophobic 
descriptor M Log P here also has a significant role to play however in this 
case M Log P enhances the binding as its coefficient value is positive 
re\'erse from the previous section. Besides MR and Xeq, Vw here has 
positive contribution towards the binding affinity. Investigation of the 
x’arioLis non-steroidal analogues taken for QSAR study in this section 
ic\ cals that steiically large, non-polar groups are necessary to achieve good 
receptors binding affinity. 

From the correlations of ABBS binding affinities of diphenyl 
methane based tamoxifen derivatives (Series 3 . 2 . 1 ) that are selective for 
ABBS over the estrogen receptor M Log P, MR and Vw descriptors are 
critical for the affinity to ABBS. Good linear quantitative structure affinity 
relationship has been established. Equation ( 6 ) and ( 7 ) (Series 3 . 2 . 1 ) 
illustrates that the rigid triphenylethylenic moiety of tamoxifen defines a 
spatial relationship driving the occupancy of the tamoxifen within ABBS 
and the binding of compounds. The positive coefficient of indicator 
parameter; indicates that morpholinic derivatives have higher affinity. This 
illustrates that the ring part of the amine is important for high affinity. 

In series 3.2.2 and 3.2.3 positive hydrophobic term appears in 
both ERa and ERP subtypes, indicating that their hydrophobic contribution 
to the binding with the substituents examined for diarylpyridines as well as 
hydroxy chromanes. Most of the substituents however are relatively polar 
and would not provide much opportunity for enhanced hydrophobic 
bonding. The positive coefficient of Vw is also significant because it will 
result in raising the binding affinity values for bulkier molecules. The 
positive coefficient of indicator parameter suggests that the substituents 
having the tendency to accept H-bonds are preferred in enhancing the 
binding affinity. 
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The common feature observed in all the series was that M Log P 
has large and positive coefficient which indicates that hydrophobic 
substituents are favoured. The positive coefficient of MR indicates, that 
sterically bulky substituents raise the binding affinity with these 
parameters, when the substituents become larger, the positive steric 
interaction becomes dominant and affinity increases. This view is 
consistent with the crystal structure of the estrogen receptor ligand binding 

, -ISO 

domain 


The X-ray crystal structure of the receptor shows that there is 
some very hydrophobic space above the B-ring of the ligand at position 11, 
presumably sufficient to accommodate groups of moderate size without 
contacting the binding surface of the receptor. Larger substituents would 
require some movement of ligand or receptor for a complex to form. The 
conclusions from QSAR of the non-steroidal compounds performed in this 
section are similar to those made from an earlier analysis of structure 
affinity relationship. 

Thus as a generalization substituents that increase the electron 
density on the phenolic ring appear to increase the binding affinity. Also 
from the positive hydrophobic interaction between ligand substituents and 

1 8 I 

receptor (the same phenomenon has been shown in the CoMFA analysis 
needs ot be tempered by the recognition that the ABCD tetracyclic coie 
structure of steroidal estrogens (as well as the corresponding units in 
nonsteroidal estrogens) is generally very hydrophobic and may contribute 
to the bulk of ligand binding by a hydrophobic mechanism. 
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CHAPTER-4 

3D-QSAR AUD 
MOLECULAR MODELLirjQ 



SECTION-A 


INTRODUCTION TO 3D QSAR USING APEX-3D 

AND CATALYST 


(1) APEX-JD : AN EXPERT SYSTEM FOR AUTOMATED 
PHARMACOPHORE IDENTIFICATION AND 3D-QSAR 

MODEL BUILDING 


The calculation of 3D-QSAR models was done on which is 
developed to represent, elucidate and utilize knowledge on structure 
acti\'ity relationships. Apex-3D can be used to build 3D-SAR and 3D- 
QSAR models which can be used for activity classification and prediction. 
The general principle of operation is based on emulating the intelligence of 
the researcher engaged in establishing relationships between a compound's 
structural parameters and its activity. The comer stone of the Apex-3D 
methodology is automated identification of biophores (pharmacophoies). 
Theses biophores can be used for building qualitative activity prediction 
rules and for creating search queries to identify new leads in a 3D-database. 
Identified biophores can be used as starting points for constructing 3D- 
QSAR models when good quantitative data is available. Combination of a 
3D pharmacophore with a quatitative regression equation is unique to the 
Apex-3D approach. Prediction of activity for novel compounds requires the 
biophore be present. The activity level is calculated from QSAR equation. 


4.1 


ir,ERAL PRIHCIFLES OF AFEX-3D: 

nrnpHORTr. PATTERNS AND BUILDING 3D-QSAR 


MODELS 


Apex-3D is based on the logico-structural approach to drug 
design developed by Dr. Valery Golender and his colleagues (1980. 1983 
1993, 1995). This approach, to a certain extent, simulates the intellioen 
a scientist engaged in establishing relationships between certain structural 
characteristics of compounds and their activity. 



These basic inductive methods of agreement, difference, and 
eoncomitanl variations are used by researchers to identify structural 
paltcrns associated with bioactivity; 

( 1 ) T!ie agreement method is based on identification of the common 
structural patterns in different compounds having the same type 
of biological activity. 

(2) The difference method is based on identification of the different 
structural patterns in active and inactive compounds. 

(3) The concomitant variations method is based on identification of 
vai iation in structural properties that explain changes in the 
biological activity of a set of compounds. 

4.1.1 PRINCIPLE OF AUTOMATED IDENTIFICATIOM OF 
BIOPHORIC PATTERNS 

The automated identification of biophoric patterns according to 
the logico-structural approach is based on the agreement and difference 
methods and involves the following steps; 

1 . Separation of data set compounds into activity classes according 
to their activity type or level. 

2. Generation of structural representations based on topological 
(2D) or topographical (3D) distance matrices, and sets of 
structural indexes for identifying biophoric descriptor centers in 
chemical compounds. 

3. Identification of common structural patterns (features) in all pairs 
of compounds belonging to a common activity class. 

4. Calculation of the number of occurrences of all identified 

structural patterns (features) among compounds from each 

activity class of the analyzed data sets. 

— - 



These occurrence numbers are used to calculate statistical 


estimates of features: 

1 . The probability that novel compounds having a given feature will 
belong to a certain activity class. 

2. The reliability calculated as the probability of nonchance 
occurrence of the feature. 

3. Identification of biophores. Biophores are selected as features 
having both probability and reliability higher than certain 
thresholds. These thresholds are established during training of the 
activity prediction system. 

4. Prediction of biological activity of novel compounds which have 
been synthesized, or suggested for synthesis, based on the 
identified biophores. 

5 Analysis of computer selected biophores and application to the 

rational synthesis of compounds possessing desirable biological 

activity. 

4.1.2 PRINCIPLE OF 3D-QSAR MODEL GENERATION 

The identified biophores are used as starting points for building 
3D-QS AR models. This procedure parallels reasoning based on the 
concomitant variations method and involves the following steps: 

1 . Automated identification of biophores (i through 5 above). 

2. Optimization of the superimposition of all compounds containing 
a selected biophore. 

3 Building 3D-QSAR models for the selected biophores based on 

correlation of ligand active sites and global molecular properties 
with biological activity. 



activity level of novel compounds based on 
identified biophores and 3D-QSAR models. 

5 . Analysis of selected 3D-QS AR models and application to rational 

drug design. 

4.2 APEX-3D STRUCTURAL DESIQri: MODULES ATiD 
COriCEPTS 

Apex-3D is a learning mle-based expert system that accumulates 
knowledge and makes inferences on structure-activity relationships from 
sti Lictuie- activity data. The system stores two types of information; 

I. Data on topological (2D) and topographical (3D) structures of 

chemical compounds. 

2- Knowledge of structure-activity relationships expressed as rules 

that associate biophoric patterns with particular biological 
activities. 

Apex-3D rules take the following forms: 

Qualitative rules: 

IF structure S contains the biophoric pattern B\, THEN it 
possesses the activity with probability Pik. 

Quantitative rules: 

IF structure S contains the biophoric pattern having the 

A = F iB • S) 

associated QSAR model ^ , THEN it possesses the activity 

Ak calculated according to the model. 

The main component of Apex-3D is an inference engine 
containing two modules: 

An inductive inference engine used for biophore search and rule 
generation. 
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A deductive inference engine used for rule-based activity 
prediction. 

Rule generation is based on structure matching procedures of the 
logico-structural approach. 

4.2.1 APEX-3D MODULES 

4.2. 1.1 INSIGHT II COMPONENT 

Apex-3D is integrated with the Insight II molecular modelling 
and molecular graphics environment in such a way that one can: 

Build 2D sketches and 3D models of chemical compounds in Insight II and 
use Apex-3 D for activity prediction and biophore searching. 

Deline the Task Definition parameters interactively using Insight II tables. 

Export compounds, biophores, and superimpositions of compounds sharing 
a common biophoric pattern from Apex-3D to Insight II. 

4.2.1.2COMPUTATIONAL CHEMISTRY COMPONENT 

Computational chemistry programs calculate structural indexes 
used for biophore definition and conformational space clustering. These 
programs include; 

> MOP AC 6.0 (QCPE program) postprocessor for calculating 
MOPAC-based indexes, including atomic charges, pi- 
populations, and donor and acceptor indexes. 

> Module for calculating hydrophobicity and molar refractivity 
based on atomic contributions (Ghose et al. 1988; Viswanadhan 
et al. 1989). 

> Module for calculating indexes based on qualitative models. A 
detailed description of indexes is given in. 

> Module for reducing the number of conformations using a 
clusterization methodology. 


[ 119 ] 



4.2.2 BIOFHORE CONCEPT OF APEX-3D 


A hiophore represents a certain structural and electronic pattern 
in a bioactivc molecule which is responsible for its activity, possibly due to 
rcccplor inlcraction. Apex-3D identifies two types of biophores: 

> Topological (2D) biophores based on graph-theoretical structure 
representation. 

> Topographical (3D) biophores based on 3D structure 
representation. 

4.2.2. ITOPOLOQICAL BIOPHORES 

Topological biophores consist of common descriptor centers 
separated by some number of bonds. These biophores are displayed for one 
molecule at a time. The bonds separating the descriptor centers can be 
highlighted. Note that multiple structures cannot be superimposed on the 
biophore. 

4.2.2.2TOPOQRAPHICAL BIOPHORES 

Topographical biophores define certain superimpositions of 
compounds in 3D space according to their common biophores. This 
superimposition serves as a convenient graphical image, as well as a tool 
for extracting additional structural information on the chirality and 
environment of biophores. Since biophore identification in Apex-3D is 
based on distance matrices that are invariant under reflection, biophores are 
isomorphic for different stereoisomers. Thus the resulting supenmposition 
of mirror images usually causes deviations in the positions of matched 

atoms. 




4.3 MOLECULAR ALIQriMENT AFiD MOLECULAR 
SUPERIMPOSmon in APEX- 3 D 

Lach identified biophore serves as a basis for molecular 
suix'nniposition by defining a common coordinate system for molecules 
sharing that biophore. But for most compounds this superimposition is not 
uniciuc because several molecular fragments can match the biophore and a 
number of conformers can fit it. Therefore, a special procedure for 
superimposition optimization is needed to select the best, or at least a 
reasonable, superimposition. 

Molecular alignment in Apex-3D for superimposition 
optimization is based on three simple intuitive principles; 

1. Biophore anchor principle: Biophoric centers must be superimposed 
for all molecules with minimal deviation. 

2. Similarity principle: Atoms of the same chemical type from different 
molecules must be superimposed as closely as possible. 

3. Atom adjacency principle: The closer an atom is to a biophoric center, 
the more important is its alignment. 

4.4 CHEMICAL STRUCTURE REPRESEHTATIOH IH 
AFEX-3D: DESCRIPTOR CENTERS AND DISTANCE 
MATRICES 

As mentioned earlier, a biophore represents a certain structural 
pattern presumed to be responsible for the biological activity. These 
biophores are the basis for Apex-3D's approach to cnemical structure 
representation. There are two parts to biophore representation in Apex-3D. 

1. Descriptor Centers that represent parts of hypothetical biophoric 
moieties capable of interacting with a receptor. 
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2. Distance Matrices that describe the mutual orientation of descriptor 
colliers using topological (number of bonds) or topographical distances 
(angstroms). 

Descriptor centers can be either atoms or pseudoatoms that can 
participate in the ligand-receptor interactions based on the following types 
of physical properties: 

>- Electrostatic interactions 

V Hydrogen bonds 

> Charge transfer complexes 

> Hydrophobic interaction 

> van der Waals (or London) dispersion forces 

These physical properties correlate with certain structural indexes 
which are calculated using various computational chemistry methods 

All of the descriptor center information is stored by Apex-3D in 
two matrices; 

1. The Property Matrix stores the structural indexes for all 
descriptor centers identified in a given structure. 

2. The Distance Matrix stores the distances between all pairs of 
descriptor centers. 

Data from these matrices are used to define the biophores which 
arc a subset of the matrices common to several compounds. 
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4.4.1 


STRUCTURAL INDEXES FOR DESCRIPTOR CENTERS 


In Apex-3D, structural indexes are divided into two groups: 

1. Indexes calculated using the MOPAC 6.0 semi-empirical 
quantum-chemical program (QCPE program). 

2. Indexes calculated using simplified computational chemistry 
models. 

4.4.2 DISTANCE MATRICES ALGORITHM FOR 
BIOFHORE EXTRACTION IN APEX-3D: 
EXHAUSTIVE SEARCH AND FAST SEARCH 

Structure matching in Apex-3D is based on the selection of 
maximal common 2D or 3D patterns of biophoric centers between two 
compounds. The algorithm constmcts the compatibility graph and selects 
its cliques (Golender and Rosenblit 1983 pp. 129-143). 

An efficient clique-finding procedure is used to select the 
compatibility graph cliques and delete isomorphic patterns. 

Apex-3D supports two versions of the biophore extraction 
algorithm for compounds with multiple conformations. 

1 . Exhaustive Search: Matches all possible pairs of conformers of all 
compounds. Each conformer of each compound will be matched with 
each conformer of all the other compounds in the training set. 

2. Fast Search: Matches the first conformer of a compound with all 
conformers of all the other compounds. This is done for each 
compound in the training set. 

Both of these algorithms use the first compound as a template 
when setting biophore property and distance matrices. In situations where 
the names given to the training set of compounds places an unusual 
molecule as the first compound (template). Apex-3D may have trouble 



idenlilying the best biophores for training. In these rare instances, a priority 
assignment can be made during task definition that will allow you to 
specify compound ordering 

4.5 ACTIVITY PREDICTIOn TRAININQ SYSTEM: 
PREDICTIOn OF BIOLOGICAL ACTIVITY On THE 
BASIS OF IDEHTIFIED BIOPHORES/ 
PHARMACOPHORES 

The ultimate goal of biophore selection is to better design new 
bioactive compounds. One of the tools for achieving this goal is the 
prediction of biological activity on the basis of identified biophores. 

Apex-3D's activity prediction system works as a filter, filtering 
out inactive compounds and sending only supposedly active compounds to 
the output. 

Supposing there are compounds as input, of them being active. 
The activity prediction system recognizes n compounds as active, of them 
actually being active. 

Two types of errors can be made during prediction. The first type 
is associated with missed active compounds. Active compounds classified 
as inactive are called false negatives. 

The second type of error is associated with classification of 
inactive compounds as active. These compounds are cdllGd false positives. 

Apex-3D uses two types of activity prediction training 
procedures: 

1 . Reclassification where prediction is done on one of the training 

set compounds using all of the training set compounds. 
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2. Leave-one-out recognition where training is done N times for N - 

1 compounds of the data set, and a prediction is made for the Mh 
compound. 

Leave-one-out recognition gives a more realistic estimate of the 
predictive power of the recognition system. This is because one of the 
compounds is excluded from the training set and a prediction of activity is 
made based on a model generated from the remaining set of compounds. 
During the training process, Apex-3D automatically sets thresholds for the 
selection of biophores according to probability and reliability. 

4.5. 1 ACTIVITY PREDICTIOH 

Apex-3D uses the rules generated during task building for 
activity prediction. New molecules for prediction can consist of one or 
more conformations just like the compounds used in the training set. 

There are two modes of operation for activity prediction: 

□ Classification, in which Apex-3D attempts to assign a new molecule 
to one or more activity classes that were defined for the task. 

□ Quantitative, in which Apex-3D attempts to predict the activity of a 
new molecule based on 3D-QSAR models present in the task's 

knowledge base. 

4,6 5D-QSAR MODELS IN TERMS OF BIOPHORIC 
SITES AND SECONDARY SITES 

3D-QSAR model building in Apex-3D allows you to identify 
potential interactive sites in ligand molecules and correlate the 
physicochemical properties of these sites and global molecular properties 
with available quantitative biological data. Ligand active sites are centered 
on atoms and are divided into two groups: 



□ Biophore sites are centers of specific ligand-receptor interactions 
participating in biophore definition and present in all analyzed 
molecules. 

□ Secondary sites are centers of specific ligand-receptor interactions 
that may be present in only a subset of the analyzed structures and 
allow mapping of secondary receptor pockets which may modify 
ligand activity. 

Such subdivision of active site groups can also tailor the 
complexity of the 3D-QSAR model towards available data. Models based 
only on biophore sites are more robust and less influenced by 
conformational uncertainties. Introduction of secondary sites usually 
requires more extensive molecular modeling for the specification of proper 
flexible tail positions. 

Model parameters are based on an active site model and structural 
indexes calculated in Apex-3D's Computational Chemistry module. The 
calculated atomic properties are rounded off before use, based on an 
estimated parameter error. This helps to avoid chance correlations based on 
insignificant variability in the property. The parameters are divided into the 

following three groups; 

1. Biophore site indexes 

Charge, pi-population, electron donor index, electron acceptor 
index, HOMO, LUMO, atomic hydrophobicity, atomic refractivity 

2. Secondary site indexes 

H-acceptors (presence, pi-population, charge, electron donor, 
hydrophobicity, refractivity) 

H-donors (presence. pi-populaUon, charge, hydrophobicity. 


refractivity) 



Heteroatoms (presence, pi-population, charge, electron donor, 
hydrophobicity, refractivity, formal charge) 

Hydrophobic (presence, pi-population, charge, electron donor, 
by dro phob ic i ty , refractivity) 

Steric (presence, pi-population, charge, electron donor, 
hydrophobicity, refractivity, formal charge) Ring centers (presence, size, 
number of pi-electrons) 

3. Global molecular properties 

Positions of the secondary sites are selected from the positions of 
atoms in molecular superimpositions. An atom of a molecule occupies the 
secondary site if its distance from the site position is less than the user- 
specified site radius. To select only the most reasonable secondary sites you 
can also specify the site occupancy threshold-the minimal number of 
compounds occupying a site before it can be included as a site . 

Secondary sites serve three primary purposes: 

1. Identify possible extensions of the biophore common to the 
compounds in the model, for example, a region of space relative 
to the biophore with additional hydrogen-bond interactions which 
increase activity. 

2. Identify steric interference; regions of space which when 
occupied by the ligand decrease activity. 

3 Identify hydrophobic pockets; regions of space which when 

occupied by the ligand increase activity. 

The biophore chosen for 3D-QSAR model building serves as a 
reference for superimposing the ligands. Biophore sites may also contribute 
quantitatively to the 3D-QSAR model as additional parameters. 
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4.7 3D-QSAR MODEL BUILDIKQ PROCEDURE 

1 . Automated selection of biophores. 

2. Optimization of superimposition of compounds sharing a 
common biophore. 

3. Interactive specification of the 3D-QSAR model parameters 
based on physicochemical properties of biophoric features, 
secondary sites, and global molecular properties. 

4. Calculation of the best 3D-QSAR model for the selected 
biophores. 

5. Selection of the multiple regression equation using stepwise 
multiple regression. 

6. Estimation of non-randomness and predictive power of obtained 
models and filtering out unreliable models. An example of a 3D- 
QSAR model for a small set of angiotensin-converting enzyme 
inhibitors is presented. 

4.7.1 SELECTiriQ RELIABLE 3D-QSAR MODELS 

The basic statistical tool for 3D-QSAR model building in Apex- 
3D is the stepwise regression algorithm (Myers 1990). This algorithm 
selects multiple regression equations by deleting and adding variables using 
the partial F-test criterion. Variables are added only if they increase the 
predictability of the model based on the PRESS (PREdicted Sum of 

Squares. 



4.8 


THE CATALYST METHODOLOGY 


4.8. 1 MOLECULE PREPARATION 

The molecules are edited with the Catalyst 2D/ 3D sketch facility 
and optimized applying the CHARMm forcefield . The molecules are then 
subjected to conformational analysis using the Poling algorithm through 
the conFirm module. It has different user control to suit the particular kind 
of molecules. 


4.8.2 HYPOTHESIS GENERATION 

Catalyst allows the building of a model or a hypothesis 
distinguishing chemical features essential for a particular activity in a class 
of compounds in different ways viz. 

( 1 ) By generating a hypothesis automatically from a training set of 
molecules. 

(2) By constructing a hypothesis manually, where the substructuies 
and chemical functions are assembled under the specified 
<yeometric constraints between them. 

o 

(3) By converting a molecule to hypothesis. 


(4) By using a template molecule. 

During the automated generation of hypothesis the user has 
several control parameters ( Spacing MinPoints. MinSubsetPoints, 
Superimposition Eaor, Misses, Feature Misses, Complete Misses, 
Tolerance Factor, CheakSuperPostion, Weight Variation, Mapping Coeff, 
Mem IdealHbondGeocemOnly, Variable Weight, Vanable Tolerance) 
' Ihe more hVDOthesis option in the Catalyst Hypogen Module. 




he Goal is to find a chemical feature - based model that is 
predictive beyond the training set molecules . Because the number of 
variables inherent to problem, of this kind is large, many simplifying 
approximation are necessary in order to achieve a practical result. 

4.8.3 VALIDATE HYPOTHESIS 


The final step is the validation of the hypothesis, which can be 
accomplished by evaluating the statistical significance, scrambling 
activities with structures testing the predictive ability on a test set. 

The output from a Catalyst hypothesis generation job is the ten 
lowest cost hypotheses found during the analysis that are different from 
each olhei. It is often not possible to discriminate between these by any 
simple statistical procedure particularly if the cost differences are small 
(less than 10 bits). Therefore, visual evaluation procedure at this point in 
the experiment is used. 


4.8.4 APPROACH 


A set of 24 molecules was selected, according to picking rules as 
the training set. For the test sets two separately service were taken the first 
series had 14 molecules & the other series had 12 molecules. Total 26 
molecules were chosen for test set predictions. The relative binding affinity 
data for estrogen receptor legends was collected from literature. The 3D 
structures were built interactively using Catalyst version 4.5 and minimized 
using CHARMm forcefield. Diverse conformations which accessed 
conformational space defined within lOKcal of the estimated global 
minimum, were generated for each compound using the poling algorithm. 
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SECTION-B 


RESULTS AND DISCUSSION 

Out of several different estrogen antagonists reported only the 
3,3,2-Triphenylacrylonitriles'®^ (table-1) were considered for molecular 
modeling and 3D QSAR analysis by Catalyst and APEX-3D expert system 
running on an SGI Indy 4000 work-station. Since this series exhibits the 
laraest range (over five (5) orders of magnitude) of RBA activities. Two 
different test sets (one with 1,1-Diphenylethylenes'®^ and 2,3- 
Diarylindincs'^'^’ were used assess the predictive ability of the 3D QSAR 
models generated. 

Common feature hypothesis were generated for finding the 
chemical features, in a 3D dimensional space, shared by a set of potent 
antiestrogenic molecules and the crystal structure of 4-hydroxy tamoxifen in 
its enzyme bound confirmation. The compound was extracted from lert of 
the protein data bank. 4-hydroxy tamoxifen was taken as the template on to 
which all the molecules were forced to map (principle = 2) and 10 of the 
most active molecules were as also considered in generating the hypothesis 

(principle value = 1). 

The pharamcophoric hypothesis represents the structural and 
function criteria required for estrogenic activity. The crystal structures of 
bcta-esterdiol and Raloxifen in their enzyme bound conformation map well 
to the hypothesis further validating the pharmacophore. 

All the molecules of the training set and test were aligned onto 
the hypothesis and the conformers were exported to APEX-3D for 
advanced 3D QSAR analysis. 
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MATERIAL APiD METHODS 


The molecules were stored in MDL format and were used for the 
computational calculation of different physicochemical properties including 
atomic charges, rr-population, electron donor and acceptor indexes, HOMO 
and LUMO coefficient, hydrophobicity and molar refractivity based on 
atomic contributions by the MOP AC 6.0 (MNDO Hamiltonian)'^'^ version. 
The compounds were classified into following three classes (i) most active 
(log RBA>1.70) (ii) active (log RBA< 1.7) (iii) less active (logIC5o<1.0). 
The data were used by APEX-3D programme for automated biophore 
(pharmacophore) identification and 3D-QSAR model building. The 
automatically identified biophore (pharmacophore) by APEX-3D in terms 
of structural and electronic pattern, the local array of descriptor centres 
(like user defined atoms, pseudo atoms like ring centres, hydrophobic 
regions or hydrogen binding sites) which are common to a class of 
molecules in their bioactive conformation, responsible for activity through 
interaction with the receptor were used to derive 3D-QSAR equations with 
the setting of, the site radius at 1.20, the occupancy at 8, the sensitivity at 
1.0 and the randomization at 100. The global properties, (total 
hydrophobicity and total refractivity) the biophoric site properties (tt- 
population, charge, HOMO, LUMO, hydrogen acceptor, hydrogen donor, 
and hydrophobicity) and the secondary site parameters (hydrogen acceptor, 
presence; hydrogen donor, presence; heteroatom, presence; hydrophobic, 
hydrophobicity; steric, refractivity; ring, presence) were used as 
independent variables and biological activity as dependent variable, to 

1 87-188 

derive equations for 3D-QSAR models 

2 

Quality of each model was estimated from the observed R 
(con-elation coefficient between experimentally observed and calculated 
(APEX-3D uses the word approximated activity), RMSA (calculated root 
mean square error based on all compounds with degrees of freedom 
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correction), RMSP (root mean square error based on ‘leave one out’ with no 
degrees of freedom correction), chance statistics (evaluated as the ratio of the 
equivalent regression equation to the total number of randomized sets; a 
chance value of 0.1 corresponds to 10% chance of fortuitous correlation) and 
match parameter (evaluated for the quality of superimposition for molecules 
having common biophores; a value of 1 corresponding to the best possible fit 
100%). 


Several biophoric models were obtained with different sizes 
(number of biophoric sites) and arrangement (spatial orientation). Among 
several biophoric models, only ten (10) models were considered based on 
the statistical criteria (R^> 0.65, Chance<0.1, Superimposition match>0.7, 
RMSA< 0.9 and RMSP<1.0)'^^ (table-2). 


TABLE 2 

STATISTICAL DETAILS OF THE BEST-SELECTED 
MODELS (MODEL NO.MO ) 


S.N. 

No 

RMSA 

RMSP 

R2 

Chance 

Size 

Match 

Variable 

No, of compounds 

1 

57 

0.59 

0.7 

0.83 

0 

5 

0.43 

4 

23 

2 

13 

0.59 

0.62 

0.82 

0 

6 

0.48 

4 

24 

3 

78 

0.73 

0.79 

0.76 

0 

4 

0.44 

5 

28 

4 

12 

0.73 

0.97 

0.75 

0.1 

7 

0.29 

— 

5 

23 

5 

59 

0.76 

0.89 

0.73 

0.1 

5 

0.32 

5 

28 

6 

49 

0.73 

0.8 

0.72 

0.1 

6 

0.26 

4 ' 

24 

7 

55 1 

0.77 

0.85 

0.72 1 

1 

0.1 

3 

0.57 

5 

26 

8 

77 

0.83 

1.03 

0.68 

0.1 

3 

0.68 

4 

24 

9 

56 

0.85 

0.95 

0.67 

0.1 

2 ^ 

0.42 

5 

28 

10 

61 

0.87 ; 

1.02 1 

0.66 

0.1 

4 

0.34 

5 

28 


From the table it is clear that all the models 1-10 have high 
statistical significance >99%, according to the chance and F values. Model 
number 8 was rejected because it did not include the most active molecule 

(4-hydroxy tamoxifen). 
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MODEL- 1 


The tive biophoric sites (1-5, white circles), common to all 
molecules in model no.l correspond to the carbonyl oxygen, its lone pair 
and centre of the phenyl ring bearing the Ri substituents respectively. The 
spatial disposition of these sites, in terms of inter site distances, is BSl- 
BS2=3.00 (±0.001), BS1-BS3=3.74 (±0.02), BS2-BS3=6.30 (±0.40) A°for 
model no. 1. The physicochemical characteristics of the biophore centres 
corresponding to sites are BSl: Pi-Popul [1.839±0.498], Charge_Het [- 
0.1 13±0.053], Don_01 [9.024±0.302], BS2: Pi-Popul [1.839±0.498], 
Charge_Het [-0.113±0.053], Don_01[9.024±0.302], BS3: H- 

Site [1±0], BS4: H-Site [1±0] and BS5: Cycle_size [1±0], Pi-electron 
[ 1 ± 0 ]. 


In addition to the identification of the five common key structural 
features described above as biophoric sites common to twenty three 
molecules, three-dimensional multiparameter equations were derived using 
these pharmacophore as template for superimposition. The in vitro activity 
log (RBA) for the Estrogen receptor binding activity was related to four 
secondary site parameters (variables): HYDROPHOBICITY as a global 
(whole molecule) property, steric [REFRACTIVITY] at various sites 
as shown below. 

log (RBA) = 10.517(±1.174) HYDROPHOBICITY - 0.227 

(±0.083) Steric [REFRACTIVITY] -0.823(0.149) Steric 
[REFRACTIVITY] + 0.454 (±0.133) Steric 

[REFRACTIVITY] - .980 
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This model presented good predictions (R^ and LOO ) for the 
training set as shown in table and plot below. 

TABLE 3 



Molecule 

log RBA 

Calculated log RBA 

Predicted log RBA 

i 

JMC89_10 

0.96 

0.99 

0.99 


JMC89_1 1 

0.81 

1.43 

1.49 


JMC89_12 

0.78 

1.06 

1.08 


JMC89_13 

0.53 

-0.42 

-0.63 

! 

JMC89_14 

0.52 

1.06 

1.11 


JMC89_16 

0.34 

0.07 

-0.74 


JMC89_17 

0.18 

-1.2 

-1.58 


JMC89_18 

-0.4 

-0.42 

-0.42 


JMC89_19 

-0.44 

-0.42 

-0.41 


JMC89_1 

2.22 

2.21 

2.21 



JIVlC89_22 

-1.4 

-0.42 

-0.2 


JMC89_23 

-2 

-1.2 

-0.98 


JMC89_24 

-2 

-1.57 

-1.43 



MODEL-2 

In addition to the identification of the five common key structural 
features described above as biophoric sites common to twenty-four 
molecules, three-dimensional multiparameter equations were derived using 
these pharmacophore as template for superimposition. The in vitro activity 
log (RBA) for the Eestrogen receptor binding activity was related to four 
secondary site parameters . 

log (RBA) = -2.994 (±0.671) PI-POPUL + 9.161 (±1.055) 
HYDROPHOBICITY + 7.693 (±2.457) 

Hydrophobic [HYDROPHOBICITY] + 0.325 
(±0.110) Steric [REFRACTIVITY] + 0.697 
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This model also presented good predictions (R^ and LOO ) for 
the training set as shown in table and plot below. 


TABLE 4 


Molecule 

log RBA 

Calculated log RBA 

Predicted log RBA 

JMC89_10 

0.96 

0.4 

0.37 

JMC89J1 

0.81 

-0.87 

1.77 

JMC89_12 

0.78 

0.25 

0.39 

JMC89_13 

0.53 

1.14 

-0.79 

JMC89_14 

0.52 

-0.01 

0.54 

JMC89_15 

0.4 

-0.16 

0.65 

JMC89J6 

0.34 

-0.22 

0.67 

JMC89_17 

0.18 

0.79 

-0.74 

JMC89J8 

-0.4 

0.21 

-0.64 

r JMC89 19 

-0.44 

0.17 

-0.63 

! ' ' JMC89J 

2.22 

0.54 

1.63 

JMC89_22 

-1.4 

-0.79 

-0.48 

JMC89_23 

-2 

-1.39 

-0.38 

^ — — — — ^ — 

JMC89_24 

-2 

-0.24 

-1.58 

JMC89_2 

2.03 

0.35 

1.65 

! JIVIC89_3 

1.97 

0.29 

1.66 

i JMC89_4 

1.89 

0.11 

1.77 

I JMC89_5 

1.87 

0.19 

1.67 

JMC89_6 

1.79 

0.11 

1.67 

JMC89_7 

1.56 

-0.12 

1.7 

JMC89_8 

1.45 

-0.23 

1.71 

JMC89_9 

1.23 

-0.45 

1.73 

TAM_OHE1 

0.36 

0.13 

0.06 

TAM_PDB 

2.38 

-0.13 

2.68 
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Model No 1 



Model No 2 
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Model No 1 

BS4 H-SiU[1«»l 


I Pi-Popul [1 839*0 498] 
Ch»rge_H«t [-0 113t0 053]. 
0on_0l]9 02440 302) 




BS2 Pi Popul (1 S39i0 498], 

Charfl«.Het[4) 11340 053], 
Don_0l'i» 02440 302] 


BS5 Cyclt.«lx«[140]. 
Pi>tl*ctron [140] 


Model No 2 


6SS Cycl*_«iz* [6 1 0], 



Oon_0l(e0889 4 0 0437]] 





MODEI^S 


The physicochemical properties of the biophoric centers 
corresponding to sites BSl; Don_01[8.163 ± 0.254], BS2: Cycle_size 
16±()1. Pi-clectron [6±0], BS3: Cycle_size [6±0], Pi-electron [6±0], BS4: 
Cycle_size [6±0], Pi-electron [6±0]. 

In addition to the identification of the four common key structural 
features described above as biophoric sites common to twenty eight 
molecules, three-dimensional multiparameter equations were derived using 
these pharmacophore as template for superimposition. The in vitro activity 
log (RBA) for the Eestrogen receptor binding activity was related to four 
secondary site parameters. 

log (RBA) = 4.685 (±1.332) HYDROPHOBICITY -8.13 

(±3.899) Hydrophobic [HYDROPHOBICITY] - 
0.867 (±0.238) Steric [REFRACTIVITY] -0.724 
(±0.129) Steric [REFRACTIVITY] -0.428 
(±0.106) Steric [REFRACTIVITY] 
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The model also presented good predictions (R^ and LOO ) for 
the training set as shown in table and plot below. 


TABLE 5 


Molecule 

log RBA 

Calculated log RBA 

Predicted log RBA 

JMC89_10 

0.96 

-1.17 

2.36 

JMC89_11 

0.81 

-0.14 

0.98 

JMC89_12 

0.78 

0.26 

0.47 

JMC89_13 

0.53 

0.51 

-0.09 

JMC89_14 

I 0.52 

-0.13 

0.67 

JMC89_15 

i 0.4 

-0.25 

0.69 

JMC89_16 

0.34 

-0.31 

0.7 

' JMC89_17 

0.18 

0.23 

-0.1 

JMC89_18 

-0.4 

0.75 

-1.36 

JMC89_19 

-0.44 

-0.75 

0.45 

JMC89_1 

2.22 

0.09 

2.11 

JIVIC89_20 

-1.05 

0.24 

-1.43 

JIV!C89_21 

-1.4 

-0.11 

-1.21 

JMC89_22 

-1.4 

-0.18 

-1.16 

JMC89_23 

! -2 

-1.14 

-0.61 

JMC89_24 

i -2 

-0.14 

-1.78 

JMC89_2 

; 2.03 

1.08 

0.77 

JMC89_3 

i 1.97 

-0.16 

2.16 

JMC89_4 

j 1.89 

0.02 

1.85 

JMC89_5 

1.87 

0 

1.87 

JMC89_6 

1.79 

-0.34 

2.19 

JMC89_7 

1.56 

0.91 

0.51 

JIVlC89_8 

1.45 

0.8 

0.52 

JIVIC89_9 

1.23 

0.28 

0.91 

TAM_E1 

1 -'I-'' 

-1.18 

-0.07 

TAM_OHE1 

0.36 

0.33 

-0.05 

TAM_PDB 

2.38 

1.43 

0.71 

TAM_Z1 

0.04 

-0.91 

1.11 
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MODEL-4 


1 he physicochemical properties of the biophoric centers 
corresp(mdinp to sites BSl: Pi-Popul [1.915±0.356], Charge_Het [- 
0.293±1.9121. Don_01[9.072±0.211], BS2; Pi-Popul [0.288± 0.022], 
ChargeJ-fei {).091± 1.746], Don_01[8.107±0.048], BS3: H-Site[l ±0], 
BS4: H-Silc[ 1 ±0], BS5: Cycle_size [1±0], Pi-electron [1±0], BS6: 
Cycle__size [1±0], Pi-electron [1±0], BS7; Cycle_size [1±0], Pi-electron 
[l±0i. 

In addition to the identification of the four common key structural 
I'eaUires described above as biophoric sites common to twenty three 
molecules, three-dimensional multiparameter equations were derived using 
these pharmacophore as template for superimposition. The in vitro activity 
log (RBA) for the Eestrogen receptor binding activity was related to four 
secondary site parameters. 

log (RBA) = 3.625 (±1.455) HYDROPHOBICITY 0.734 
(±0.232) Steric [REFRACTIVITY] -0.248(±0.105) 
Steric [REFRACTIVITY] -0.53(±0.167) Steric 
[REFRACTIVITY] + 0.297 (±0.09) Steric 

[REFRACTIVITY]. 
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The model also presented good predictions (R^ and LOO R" ) for 
the training set as shown in table and plot below. 

TABLE 6 


Molecule 

log RBA 

Calculated log RBA 

Predicted log RBA 

JMC89_10 

0,96 

-0.78 

1.93 

JMC89_11 

0.81 

0.29 

0.39 

JMC89_12 

0.78 

0.22 

0.52 

^ ^jMC89_13 

0.53 

-0.03 

0.57 

JMC89_14 

0.52 

-0.9 

1.55 

JMC89_15 

0.4 

1.4 

-2.14 

JMC89_16 

0.34 

-0.22 

0.61 

'"'"'JMC89_17 

0.18 

0.03 

0.12 

^ JMC89_18 

-0.4 

-0.06 

-0.32 

JMC89_19 

-0.44 

-0.93 

0.72 

JMC89 1 

2.22 

0.48 

1.62 

JMC89 22 

-1.4 

-0.34 

-0.93 

JMC89 23 

-2 

-1.51 

0.4 

JIV1C89 24 

-2 

0.02 

-3.34 

JMC89 2 

2.03 

1.2 

0.42 

jmcsq's 

1.97 

0.23 

1.68 

JIVIC89 4 

1.89 

0.25 

1.6 

JMC89 5 

1.87 

0.23 

1.6 

JMC89 6 

1.79 

0.37 

1.37 

JMC89 7 

1.56 

-0.08 

1.66 

JMC89 13 

0.53 

-0.2 

-0.43 

JMC89 14 

0.52 

1.14 

1.19 

JMC89J5 

1 0.4 

-0.76 

-1.04 

JMC89_16 

I 0.34 

1.14 

1.21 

1 . - — — — ■ 

JMC89_17 

1 0.18 

0.12 

0.09 

JMC89_18 

i -0.4 

-0.12 

-0.04 

JMC89_19 

-0.44 

0.57 

0.65 

JMC89_1 

2.22 

2.55 

2.68 

JMC89_20 

-1.05 

-0.78 

-0.57 

JMC89_21 

-1.4 

-1.4 

-1.41 

JMC89_22 

-1.4 

-0.71 ^ 

-0.2 

JMC89_23 

-2 

-0.51 

0.32 

JMC89_24 

-2 

-2.31 

-2.7 

JMC89_2 1 

2.03 

1.41 

1.28 

JMC89_3 n 

1.97 

1.69 

1.64 

; JIV1C89_4 

1.89 

0.74 

0.66 

JMC89_5 

1.87 

1.43 

1.37 

JMC89_6 

1.79 

1.43 

1.38 

JMC89_7 

1.56 

1.14 

1.1 

JMC89_8 

1.45 

0.57 

0.5 

JMC89_9 n 

1.23 

2.21 

2.46 

TAM_E1 

-1.7 

-1.28 

-0.57 ' 

TAM_OHE1 

0.36 

0.51 

0.53 

TAM PDB n 

2.38 

1.29 

1.06 


0.04 

0.17 

0.19 
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Model No 3 


Model No 4 



3 
2 
1 
0 
1 
2 
3 

- 3 - 2-10 1 2 3 

PRFn 

The physicochemical properties of the biophoric centers 
corresponding to sites BSl: Don_01[8.166 ± 0.253], BS2: H-Site[l±0], 
BS3; Cycle_size [6±0], Pi-electron [6±0], BS4: Cycle_size [6±0], Pi- 
electron [6±0], BS5: Cycle_size [6±0], Pi-electron [6±0]. 

In addition to the identification of the seven common key 
structural features described above as biophoric sites common to twenty 
three molecules, three-dimensional multiparameter equations were derived 
using these pharmacophore as template for superimposition. The in vitro 
activity log (RBA) for the Estrogen receptor binding activity was related to 
five secondary site parameters. 

log (RBA) = -1.143 (±0.336) 

TOTAL_HYDROPHOBICITY+l .894 (±0.447) 

Hydrophobic [HYDROPHOBICITYJ-7.789 

(±3.42) Hydrophobic [HYDROPHOBICITY] -f 
0.372 (±0.164) Steric [REFRACTIVITY] - 
0.408(±0. 1 58)Steric [REFRACTIVITY] . 




CAL 

MODEL-5 
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The model also presented good predictions (R“ and LOO R“ ) foi 
the training set as shown in table and plot below. 


TABLE -7 


Molecule 

log RBA 

Calculated log RBA 

Predicted log RBA 

JMC89_10 

0.96 

0.91 

0.91 

JMC89_11 

: 0.81 

0.57 

0.55 

JMC89_12 

I 0.78 

1.43 

1.51 

JMC89_13 

0.53 

-0.2 

-0.43 

JMC89_l"4 

0.52 

1.14 

1.19 

; JMC89_15 

0.4 

-0.76 

-1.04 

' JMC89_16 

0.34 

1.14 

1.21 

; JMC89_17 

0.18 

0.12 

0.09 

JMC89_18 

-0.4 

-0.12 

-0.04 

: JMC89_19 

-0.44 

0.57 

0.65 

JMC89_1 

2.22 

2.55 

2.68 

JMC89_20 

-1.05 

-0.78 

-0.57 

! JMC89_21 

-1.4 

-1.4 

-1.41 

JMC89_22 

-1.4 

-0.71 

-0.2 

: JMC89_23 

-2 

-0.51 

0.32 

; JMC89_24 

-2 

-2.31 

-2.7 

: JMC89_2 

2.03 

1.41 

1.28 

: JMC89_3 

1.97 

1.69 

1.64 

: JMC89_4 

1 1.89 

0.74 

0.66 

I JMC89_5 

1.87 

1.43 

1.37 

JIVlC89_6 

1.79 

1.43 

1.38 

JMC89_7 

1.56 

1.14 

1.1 

JMC89_8 

1.45 

0.57 

0.5 

JMC89_9 

1.23 

2.21 

2.46 

TAM_E1 

-1.7 

-1.28 

-0.57 

TAM_OHE1 

0.36 

0.51 

0.53 

TAM_PDB 

2.38 

1.29 

1.06 

TAM_Z1 

0.04 

0.17 

0.19 
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MODEL-6 


The physicochemical properties of the biophoric centers 
corresponding to sites BSI: Pi-Popul [0.2889± 0.0202], Charge_Het [- 
0.2744±().004S], Don_01[8.1019±0.0510], BS2: Pi-Popul [1.7684± 

0.5971], Charge_Het [1.7684±0.5971], Don_0i[8.954±0.4521], BS3: H- 
Site[l±0], BS4: H-Site[l±0], BS5: Cycle_size [6±0], Pi-electron [6±0], 
BS6: Cycle_sizc [6±0], Pi-electron [6±0]. 

In addition to the identification of the seven common key 
structural features described above as biophoric sites common to all twenty 
eight molecules, three-dimensional multiparameter equations were derived 
using these pharmacophore as template for superimposition. The in vitro 
activitN' log (RBA) for the Estrogen receptor binding activity was related to 
four secondary site parameters. 

Log RBA = -1.142 (±0.334) TOTAL_HYDROPHOBICITY + 
4.841 (±1.118) Hydrophobic [HYDROPHOBICITY] 
-0.485 <±0.1 18) Steric [REFRACTIVITY] + 

0.601(±0.143) Steric [REFRACTIVITY] + 7.813 
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This model too presented good predictions (R“ and LOO R" ) for 
the training set us shown in table and plot below. 


TABLE -8 


Molecule 

log RBA 

Calculated log RBA 

Predicted tog RBA 

JMC8S_10 

0.96 

0.14 

0.08 

JMC89_11 

0.81 

0.52 

0.49 

JMC89_12 

0.78 

0.66 

0.64 

JMC89_13 

; 0.53 

-0.14 

-0.2 

JMC89_14 

0.52 

0.22 

0.19 

JMC89_15 

0.4 

-0.2 

-0.26 

JMC89_16 

0.34 

0.37 

0.37 

JMC89_17 

0.18 

0.66 

0.72 

JMC89_18 

-0.4 

-0.16 

-0.07 

JMC89_19 

-0.44 

0.52 

0.65 

JMC89_1 

2.22 

1.9 

1.82 

JMC89_22 

-1.4 

0.6 

0.84 

; JMC89_23 

i -2 

-1.12 

-0.73 

JMC89_24 

' -2 

-1.89 

1 -0.96 

JMC89_2 

1 2.03 

1.93 

1.88 

JMC89_3 

1 1.97 

1.91 

1.9 

JMC89_4 

1.89 

2.07 

2.16 

JMC89_5 

1.87 

2.33 

2.52 

JMC89_6 

1.79 

1.38 

1.32 

JMC89_7 

1.56 

2.04 

2.21 

JMC89_8 

1.45 

1.47 

1.48 

I JMC89_9 

1.23 

0.66 

0.58 

' TAM_OHE1 

0.36 

-0.26 

-0.32 

TAM_PDB 

2.38 

1.42 

1.05 
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Model No 5 


Model No 6 






MODEL-7 


llie physicochemical properties of the biophoric centers 
coirospondiny to sites BSl: Pi-Popul [0.2917± 0.0219], Charge_Het [- 
().2747±().()()621. Don_0I[8.0934±0.0543], BS2: Cycle_size [6±0], Pi- 
elcciron [6±Oj, BS3: Cycle_size [6±0], Pi-electron [6±0]. 

In addition to the identification of the seven common key 
struciural features described above as biophoric sites common to twenty 
se\ cn molecules, three-dimensional multiparameter equations were derived 
using these pharmacophore as template for superimposition. The in vitro 
acti\ ii>' log (RBA) for the Estrogen receptor binding activity was related to 
fi\ e secondar>' site parameters. 

Log RBA = -2.053 (±0.372) TOTAL_HYDROPHOBICITY + 
5.913(±2.497) Hydrophobic [HYDROPHOBICITY] 
+ 1.516 (±0.417) Hydrophobic 

[HYDROPHOBICITY] -0.296 (±0.121) Steric 

[REFR ACTIVITY] - 0.622 (±0.159) Steric 

[REFR ACTIVITY] + 13.511 

The model presented good predictions (R^ and LOO R^ ) for the 
training set as shown in table and plot below. 

TABLE- 9 


Molecule 

log RBA 

Calculated log RBA 

Predicted log RBA 

JMC89_10 

0.96 

1.94 

2.25 

JMC89 11 

0.81 

0.44 

0.41 

JMC89_12 

0.78 

0.92 

0.96 

JMC89 13 

0.53 

-0.75 

-1.03 

JMC89 14 

0.52 

1 

1.04 

JMC89 15 

0.4 

-0.58 

-1 

JMC89 16 

0.34 

0.44 

0.48 

JMC89 17 

0.18 

0.31 

0.38 


-0.4 

-0.4 

-0.4 

JMC89 19 

-0.44 

-0.32 

-0.31 

JMC89 1 

2.22 

2 

1.91 

JMC89 22 

-1.4 

-0.9 

-0.28 


-2 

-1.96 

-1.94 

JMC89 24 

-2 

-0.44 

-0.13 

JMC89 2 


0.54 

0.42 

JMC89_3 


1.94 

'l.93 
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jMCS9_4 

1 . 8 S 

1.48 

1.35 

JMCS9_5 

i .87 

1.51 

1.47 

JMC89_6 

1 7 S 

1.98 

2 

JMC89_:' 

1 55 

1 

0.95 

JMCS9_8 

1 45 

0.86 

0.68 

JMC89_9 

1.23 

1.98 

2.08 

TAM„E1 

- 1.7 

- 0,75 

- 0.49 

TAM_qHE1 

0 36 ' 

0.73 

■ 0.89 

TAM_'PDB__ 

2.38 • 

2.02 

1.83 



MODEL-9 

The physicochemical properties of the biophoric centers 
ci>rresponding to sites BSl: Cycle_size [6±0], Pi-electron [6±0], BS2: 
('ycle_si/.e i6±()j. Pi-electron [6±0]. 

In addition to the identification of the seven common key 
structural features described above as biophoric sites common to twenty 
se\-en molecules, three-dimensional multiparameter equations were derived 
using these pharmacophore as template for superimposition. The in \'itro 
activity log (RBA) for the Estrogen receptor binding activity was related to 
five secondary site parameters. 

Log RBA = -0.743(±0.391) TOTAL_HYDROPHOBICITY - 
5.679(±1.649) Hydrophobic 

[HYDROPHOBICITY] -0.524 (±0.14) Steric 

[REFRACTIVITY] + 5.563(±1.23) Steric 

[REFRACTIVITYl+4.284 (±1.607) Steric 
[REFRACTIVITY] -30.464. 
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l iu* model too presented good predictions (R” and LOO R~ ) 
the training set as shown in table and plot below. 

TABLE-IO 


Molecule 
JMC8S_10 
JMC89ji’ 
" JMCS9_12 
JMC89j?3" 
JMC89llj" 
JMC89_15" 
JMC89_16 
JMC89_17 
JMC89_18 
JMC89_1S 
JMC89_1 
JMC89_20 
JMC89_21 
JMC89l|2'' 
~ J MC89l23~ 
JMC8~9_24 ' 
JMC89_2 ” 
j'mC89_3"' 

jMCsgl"' 


_JMC89_6__ 

JJmcmT 

"jmc^T 

'~JMC89_9'’ 
jTAt^E1_ 
tmF ohei 


TAM _PDB 

'tam_zi 

JMC89__16 

JMC89"17 


JMC89^18 
JMC89 J9 
JMCS'sr l” 


JMC89 20 


JMC89 21 


JMC89 22 


JMC89_23 
JMC89_24 
JMC89_2 
JMC89_3 
JMC89 4 


.1?? _ Ca lculated log RBA 

. 09S 1.59 

0 81 -0,18 

. 122 

0,53 OJ 

12f 

__ 0.4 0£7 

' 0.34 -0,25 

o' ^ "7 . ,._7 lii! 

-0,4 7~7 0.22 

-0 44 ^ ^7" -0.18 

2.22 3 12 

-1.05 r [CAT 

~~-1.4~~'’ r -0.45 

~ -1.4 1 -1,08 

-2 ; -249 

-2 ! -129 

2,03 ^ 07 

1.97 j 159 

1.89 171 

1.8 7 063 

9 I 122 

6 [ 104 

5 0.67 

3 I 1.22 

7 _ -045 

6 ^022 

8 101 

4 ^008 

_ j -0.25 ~ 

1.61 

0.22 

•0.18 

3.12 

■0.47 

0.45 

1.08 

2.19 

1.29 

0.7 


Predicted log RBA 

183 

-0.28 

129 

0.72 

109 ^ 

05 

-0.48 

2.11 ^ 

0.41 

-0.16 

3.44 ^ 

-0.28 

-0.3 

-0.85 

-2.29 

-1.08 

0.58 
144 
1.67 
0.23 
1.14 
0.98 
0.59 
1.22 
-0.25 
-0.28 
0.47 
-0.5 
-0.48 
2.11 
0.41 
-0.16 
3.44 
-0.28 
-0.3 
-0.85 
-2.29 
-108 
0.58 


1.44 


1.67 



















JMC89_5 

1.87 

0.63 

i ...0:23 _ J 

JMC89_6 

1 79 

1.22 

1.14 : 

JMC89_7 

1,56 ' 

1.04 

0.98 I 

JMC89_8 

1 45 

0.67 

0.59 ' 

JMC89_9 

^ ^ 1_23 

1.22 

1.22 : 

TAM_E1 


-0.45 

-0.25 ' 

TAM_OHE1 

, _ .. 

-0.22 

-0.28 

TAM_PD8 

2.38 ; 

1.01 

0.47 I 

TAM_21 

0.04 ! 

-0.08 

-0.5 I 
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MODEHO 

The ph>-sicochei-nical properties of the biophoric centers 
corresponding to sites BSI; Don_01[8.506± 0.502], BS2: H-Site[l±0], 
BS3: Cycie_.size [6±0], Pi-electron [6±0], BS4: Cycle_size [6±0], Pi- 
elcetron j6±0]. 


In addition to the identification of the seven common key 
siruciurul leaiures described above as biophoric sites common to twenty 
se\ en molccuics, three-dimensional multiparameter equations were derived 
usinu these pharmacophore as template for superimposition. The in vitio 
acli\'iiy log (RBA) for the Estrogen receptor binding activity was related to 
five secondary site parameters. 

log RBA = -1.596 (±0.367) TOTAL_HYDROPHOBICITY + 
11.229 (±5.154) Hydrophobic [HYDROPHOBICITY] 
- 1.743 (± 0.898) Hydrophobic [HYDROPHOBICITY] 
-7.936 (± 2.012) Hydrophobic [HYDROPHOBICITY] 
+ 1.5 (± 0.601) Hydrophobic [HYDROPHOBICITY] + 
8.738 
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Model Mo 10 


Model No 10 


Cvc)*.«u*HiT0J 
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TABLE-11 


: l\/!olecu!e 

Jog RBA 

Calculated log RBA 

Predicted log RBA 

: JMC89_10 ! 0.96 

0.36 

0.33 

i JMC89_1 1 

0.81 

-0.12 

-0.17 1 

JMC89_12 

0.78 

1.08 

1.11 

JMC89_13 

0.53 

0.84 

1.13 

JMC89_14 

0.52 

0.68 

0.69 

JMC89_15 

0.4 

-1.43 

-2.14 

JMC89J6 

0.34 • 

0.68 

0.7 

JIV1C89_17 

0.18 

1.08 

1.19 

JMC89_18 

-0.4 

-1.08 

-1.2 

JMC89_19 

-0.44 

-0.12 

-0.1 

JMC89_1 

2.22 

2.59 

3.02 

JMC89_20 

-1.05 

0.2 

0.27 

JMC89_21 

-1.4 

-0.68 

-0.6 

JMC89_22 

-1.4 

-0.48 

0.76 

JMC89_23 

-2 

-1.4 

-1.23 

JMC89_24 

-2 

-0.44 

-0.31 

i JMC89_2 

1 2.03 

1.94 

1.91 

JMC89_3 

1.97 

2.22 

3.17 

JMC89_4 

1.89 

1.58 

l-S 

j JMC89_5 

1.87 

1.08 

0.98 

JMC89_6 

1.79 

1.08 

0.99 

JMC89_7 

1.56 

0.68 

0.61 

JMC89_8 

1.45 

1.1 

0.77 

JMC89_9 

1.23 

1.25 

1.25 

TAM_E1 

-1.7 

-0.68 

-0.57 

TAM_OHE1 

0.36 

-0.2 

-0.23 

TAM_PDB 

2.38 

1.78 

1.52 

TAM_Z1 

0.04 

-0.68 

-0.76 


VALIDATION or THE MODELS 

The models generated were validated for their predictive 
ability against a test set; two series of compounds were used as a test 
set, which were not the part of the training set. 
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Predictions of test set of 26 compounds by the hypotheses 1-10 
deri\'cd by using the training set of 28 compounds. 

TABLE -12 


Test 

mofecule 

ACT 

B57 

B13 

B78 

B12 

B59 

B49 

B55 

B56 

B61 

1 

1.58 

0.39 

7.34 

2.13 

1.42 

0.29 

- 0.46 

0.8 

- 0.17 

- 0.28 

2 

- 0.16 

4.27 

2.52 

0.65 

2.45 

1.65 

2,56 

2.96 

3.6 

1.4 

3 

- 0.22 

4.52 

2.52 

2.13 

1.42 

1.49 

0.74 

2.96 

0.92 

1.4 

4 

- 0.34 

0.34 

6.71 

3.34 


1.31 

2.22 

1.12 

0,61 

0.92 

5 

1.48 

4.52 

2.52 

0.65 

1.42 

0.91 



0.24 

0.36 

6 

1.45 

0.34 

6.08 

1.87 

2.78 

1.26 

2.33 

1.33 

0.71 

1.08 

7 

1.17 

0.64 

2.52 

0.65 

- 0.57 

1.77 

2.67 

2.13 

1.02 

5.52 

8 

1.08 



2.13 


0.74 


0.28 

0.09 

0.12 

9 

0.95 



2.13 


1.14 


1 

1.13 

0.68 

10 

0.67 

5.05 

2.52 

2.13 

1.42 

1.43 

0.66 

3.66 

1.39 

1.08 

11 

0 

4.27 

2.52 

2.13 

2.78 

1.43 

0.44 

2.64 

0.71 

1.08 

12 

- 0.12 

4.81 

2.52 

2.13 

1.42 

2.95 

1.42 

2.02 

0.56 

- 0.2 

13 

0 

- 1.56 





1.82 

2.14 

0.24 


14 

-1 

0.39 





0.79 

0.3 

1.64 

- 1.08 

15 

- 1.16 

2.36 


- 



1.59 

2.32 

0.04 


16 

- 1.46 

2.75 



1 


- 0.53 

- 2.06 

1 

- 0.79 

- 2.91 

17 

- 1.6 

4.27 





- 0.53 

- 2.06 

- 1.88 

- 4.1 

18 

- 1.7 

- 1.57 





0.39 

- 0.42 

- 0.13 

- 1.64 

19 

- 0.13 

1.87 





- 0.43 

1,12 

- 0.27 


20 

- 0.59 

4.15 

5 

0.65 

2.45 

- 0.12 

- 0.89 

0.3 

- 0.69 

- 1.08 

21 

- 0.68 

4.27 





1.24 

1.12 

- 0.27 

- 0.44 

^ 22 

: - 0.68 

2.36 





0.79 

0.3 

- 0.69 


23 

- 0.7 

- 2.01 





3.33 




: 24 

- 0.82 

2.45 

3.47 

0.7 

2.45 

- 0.12 

- 0.89 

- 0.13 

- 0.69 

- 1.08 

! 25 

- 0.89 

0.77 





1.82 

2.14 

1.17 

0.36 

j 

! 26 

- 0.89 

- 0.04 





0,39 

- 0.42 

- 1.05 




1 

! 

Correl 

0.078 

0.201 

0.040 

0.405 

0.029 

0.265 

0.459 

0.339 

0.638 


The model no 10 demonstrated the good predictictive ability for 
against test and hence was chosen as the best model that explains the 
estrogen receptor binding affinity activity. 
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CONCLUSION 


The QSAR studies have been successfully applied to a set of 
compounds to generate essential structural and physicochemical 
1 ccjLiii cinents in term of common biophoric sides (Pharmacophore) and 
secondary sites for binding and interacting with Hi receptors. The Apex 3D 
model leveals regions in 3D space around these ligand and provides a 
hypothetical picture of the main chemical features viz. Pi_population, 
charge_Het and Don_01 for first biophoric site (BSl) and Cycle_size and 
Pi_electron for second and third biophoric sites (BS2 & BS3) respectively. 
Tiie analysis also shows the significant correlation of hydrophobic and 
steric factors with biological activity. 

Catalyst has led to generation of bioactive conformations and 
hypothesis obtained identified important features (such as hydrophobic, 
aromatic hydrophobic, hydrogen bond acceptor and steric refractivity) of 
the surface accessible models. The hydrohobic and positive ionizable 
features are the minimum components of an effective estrogen antagonistic 
binding hypothesis. 

The model has led to predictions, indentification of the 
pharmacophore and improvement of understanding of receptor topography 
in terms of interaction or binding sites. 
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